Label-Noise Reduction with Support Vector Machines

Sergiy Fefilatyev, Matthew Adam Shreve, Kurt Kramer, Lawrence Hall, Dmitry Goldgof, Rangachar Katsuri, Kendra L. Daly, Andrew Walker Remsen, Horst Bunke, Kendra L. Daly

Research output: Other contribution

Abstract

The problem of detection of label-noise in large datasets is investigated. We consider applications where data are susceptible to label error and a human expert is available to verify a limited number of such labels in order to cleanse the data. We show the support vectors of a Support Vector Machine (SVM) contain almost all of these noisy labels. Therefore, the verification of support vectors allows efficient cleansing of the data. Empirical results are presented for two experiments. In the first experiment, two datasets from the character recognition domain are used and artificial random noise is applied in their labeling. In the second experiment, a large dataset of plankton images, that contains inadvertent human label error, is considered. It is shown that up to 99% of all label-noise from such datasets can be detected by verifying just the support vectors of the SVM classifier.

Original languageAmerican English
StatePublished - Nov 1 2012

Keywords

  • Support vector machines
  • Noise
  • Training
  • Training data
  • Machine learning
  • Humans
  • Noise measurement

Disciplines

  • Life Sciences

Cite this