Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels. / Deng, Lihui; Yang, Bo; Kang, Zhongfeng; Wu, Jiajin; Li, Shaosong; Xiang, Yanping.

In: Complex and Intelligent Systems, 2024.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Deng, L, Yang, B, Kang, Z, Wu, J, Li, S & Xiang, Y 2024, 'Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels', Complex and Intelligent Systems. https://doi.org/10.1007/s40747-024-01360-z

APA

Deng, L., Yang, B., Kang, Z., Wu, J., Li, S., & Xiang, Y. (2024). Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels. Complex and Intelligent Systems. https://doi.org/10.1007/s40747-024-01360-z

Vancouver

Deng L, Yang B, Kang Z, Wu J, Li S, Xiang Y. Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels. Complex and Intelligent Systems. 2024. https://doi.org/10.1007/s40747-024-01360-z

Author

Deng, Lihui ; Yang, Bo ; Kang, Zhongfeng ; Wu, Jiajin ; Li, Shaosong ; Xiang, Yanping. / Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels. In: Complex and Intelligent Systems. 2024.

Bibtex

@article{28f656f027ae466585820db7732d5f48,

title = "Separating hard clean samples from noisy samples with samples{\textquoteright} learning risk for DNN when learning with noisy labels",

abstract = "Learning with Noisy Labels (LNL) methods aim to improve the accuracy of Deep Neural Networks (DNNs) when the training set contains samples with noisy or incorrect labels, and have become popular in recent years. Existing popular LNL methods frequently regard samples with high learning difficulty (high-loss and low prediction probability) as noisy samples; however, irregular feature patterns from hard clean samples can also cause high learning difficulty, which can lead to the misclassification of hard clean samples as noisy samples. To address this insufficiency, we propose the Samples{\textquoteright} Learning Risk-based Learning with Noisy Labels (SLRLNL) method. Specifically, we propose to separate noisy samples from hard clean samples using samples{\textquoteright} learning risk, which represents samples{\textquoteright} influence on DNN{\textquoteright}s accuracy. We show that samples{\textquoteright} learning risk is comprehensively determined by samples{\textquoteright} learning difficulty as well as samples{\textquoteright} feature similarity to other samples, and thus, compared to existing LNL methods that solely rely on the learning difficulty, our method can better separate hard clean samples from noisy samples, since the former frequently possess irregular feature patterns. Moreover, to extract more useful information from samples with irregular feature patterns (i.e., hard samples), we further propose the Relabeling-based Label Augmentation (RLA) process to prevent the memorization of hard noisy samples and better learn the hard clean samples, thus enhancing the learning for hard samples. Empirical studies show that samples{\textquoteright} learning risk can identify noisy samples more accurately, and the RLA process can enhance the learning for hard samples. To evaluate the effectiveness of our method, we compare it with popular existing LNL methods on CIFAR-10, CIFAR-100, Animal-10N, Clothing1M, and Docred. The experimental results indicate that our method outperforms other existing methods. The source code for SLRLNL can be found at https://github.com/yangbo1973/SLRLNL.",

keywords = "Deep neural networks, Generalization error, Learning risk, Learning with noisy labels",

author = "Lihui Deng and Bo Yang and Zhongfeng Kang and Jiajin Wu and Shaosong Li and Yanping Xiang",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

doi = "10.1007/s40747-024-01360-z",

language = "English",

journal = "Complex and Intelligent Systems",

issn = "2199-4536",

publisher = "Springer",

}

RIS

TY - JOUR

T1 - Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels

AU - Deng, Lihui

AU - Yang, Bo

AU - Kang, Zhongfeng

AU - Wu, Jiajin

AU - Li, Shaosong

AU - Xiang, Yanping

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024

Y1 - 2024

N2 - Learning with Noisy Labels (LNL) methods aim to improve the accuracy of Deep Neural Networks (DNNs) when the training set contains samples with noisy or incorrect labels, and have become popular in recent years. Existing popular LNL methods frequently regard samples with high learning difficulty (high-loss and low prediction probability) as noisy samples; however, irregular feature patterns from hard clean samples can also cause high learning difficulty, which can lead to the misclassification of hard clean samples as noisy samples. To address this insufficiency, we propose the Samples’ Learning Risk-based Learning with Noisy Labels (SLRLNL) method. Specifically, we propose to separate noisy samples from hard clean samples using samples’ learning risk, which represents samples’ influence on DNN’s accuracy. We show that samples’ learning risk is comprehensively determined by samples’ learning difficulty as well as samples’ feature similarity to other samples, and thus, compared to existing LNL methods that solely rely on the learning difficulty, our method can better separate hard clean samples from noisy samples, since the former frequently possess irregular feature patterns. Moreover, to extract more useful information from samples with irregular feature patterns (i.e., hard samples), we further propose the Relabeling-based Label Augmentation (RLA) process to prevent the memorization of hard noisy samples and better learn the hard clean samples, thus enhancing the learning for hard samples. Empirical studies show that samples’ learning risk can identify noisy samples more accurately, and the RLA process can enhance the learning for hard samples. To evaluate the effectiveness of our method, we compare it with popular existing LNL methods on CIFAR-10, CIFAR-100, Animal-10N, Clothing1M, and Docred. The experimental results indicate that our method outperforms other existing methods. The source code for SLRLNL can be found at https://github.com/yangbo1973/SLRLNL.

AB - Learning with Noisy Labels (LNL) methods aim to improve the accuracy of Deep Neural Networks (DNNs) when the training set contains samples with noisy or incorrect labels, and have become popular in recent years. Existing popular LNL methods frequently regard samples with high learning difficulty (high-loss and low prediction probability) as noisy samples; however, irregular feature patterns from hard clean samples can also cause high learning difficulty, which can lead to the misclassification of hard clean samples as noisy samples. To address this insufficiency, we propose the Samples’ Learning Risk-based Learning with Noisy Labels (SLRLNL) method. Specifically, we propose to separate noisy samples from hard clean samples using samples’ learning risk, which represents samples’ influence on DNN’s accuracy. We show that samples’ learning risk is comprehensively determined by samples’ learning difficulty as well as samples’ feature similarity to other samples, and thus, compared to existing LNL methods that solely rely on the learning difficulty, our method can better separate hard clean samples from noisy samples, since the former frequently possess irregular feature patterns. Moreover, to extract more useful information from samples with irregular feature patterns (i.e., hard samples), we further propose the Relabeling-based Label Augmentation (RLA) process to prevent the memorization of hard noisy samples and better learn the hard clean samples, thus enhancing the learning for hard samples. Empirical studies show that samples’ learning risk can identify noisy samples more accurately, and the RLA process can enhance the learning for hard samples. To evaluate the effectiveness of our method, we compare it with popular existing LNL methods on CIFAR-10, CIFAR-100, Animal-10N, Clothing1M, and Docred. The experimental results indicate that our method outperforms other existing methods. The source code for SLRLNL can be found at https://github.com/yangbo1973/SLRLNL.

KW - Deep neural networks

KW - Generalization error

KW - Learning risk

KW - Learning with noisy labels

UR - http://www.scopus.com/inward/record.url?scp=85186461423&partnerID=8YFLogxK

U2 - 10.1007/s40747-024-01360-z

DO - 10.1007/s40747-024-01360-z

M3 - Journal article

AN - SCOPUS:85186461423

JO - Complex and Intelligent Systems

JF - Complex and Intelligent Systems

SN - 2199-4536

ER -

ID: 385648232

Faculty of Law