Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels. / Deng, Lihui; Yang, Bo; Kang, Zhongfeng; Wu, Jiajin; Li, Shaosong; Xiang, Yanping.
In: Complex and Intelligent Systems, 2024.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Separating hard clean samples from noisy samples with samples’ learning risk for DNN when learning with noisy labels
AU - Deng, Lihui
AU - Yang, Bo
AU - Kang, Zhongfeng
AU - Wu, Jiajin
AU - Li, Shaosong
AU - Xiang, Yanping
N1 - Publisher Copyright: © The Author(s) 2024.
PY - 2024
Y1 - 2024
N2 - Learning with Noisy Labels (LNL) methods aim to improve the accuracy of Deep Neural Networks (DNNs) when the training set contains samples with noisy or incorrect labels, and have become popular in recent years. Existing popular LNL methods frequently regard samples with high learning difficulty (high-loss and low prediction probability) as noisy samples; however, irregular feature patterns from hard clean samples can also cause high learning difficulty, which can lead to the misclassification of hard clean samples as noisy samples. To address this insufficiency, we propose the Samples’ Learning Risk-based Learning with Noisy Labels (SLRLNL) method. Specifically, we propose to separate noisy samples from hard clean samples using samples’ learning risk, which represents samples’ influence on DNN’s accuracy. We show that samples’ learning risk is comprehensively determined by samples’ learning difficulty as well as samples’ feature similarity to other samples, and thus, compared to existing LNL methods that solely rely on the learning difficulty, our method can better separate hard clean samples from noisy samples, since the former frequently possess irregular feature patterns. Moreover, to extract more useful information from samples with irregular feature patterns (i.e., hard samples), we further propose the Relabeling-based Label Augmentation (RLA) process to prevent the memorization of hard noisy samples and better learn the hard clean samples, thus enhancing the learning for hard samples. Empirical studies show that samples’ learning risk can identify noisy samples more accurately, and the RLA process can enhance the learning for hard samples. To evaluate the effectiveness of our method, we compare it with popular existing LNL methods on CIFAR-10, CIFAR-100, Animal-10N, Clothing1M, and Docred. The experimental results indicate that our method outperforms other existing methods. The source code for SLRLNL can be found at https://github.com/yangbo1973/SLRLNL.
AB - Learning with Noisy Labels (LNL) methods aim to improve the accuracy of Deep Neural Networks (DNNs) when the training set contains samples with noisy or incorrect labels, and have become popular in recent years. Existing popular LNL methods frequently regard samples with high learning difficulty (high-loss and low prediction probability) as noisy samples; however, irregular feature patterns from hard clean samples can also cause high learning difficulty, which can lead to the misclassification of hard clean samples as noisy samples. To address this insufficiency, we propose the Samples’ Learning Risk-based Learning with Noisy Labels (SLRLNL) method. Specifically, we propose to separate noisy samples from hard clean samples using samples’ learning risk, which represents samples’ influence on DNN’s accuracy. We show that samples’ learning risk is comprehensively determined by samples’ learning difficulty as well as samples’ feature similarity to other samples, and thus, compared to existing LNL methods that solely rely on the learning difficulty, our method can better separate hard clean samples from noisy samples, since the former frequently possess irregular feature patterns. Moreover, to extract more useful information from samples with irregular feature patterns (i.e., hard samples), we further propose the Relabeling-based Label Augmentation (RLA) process to prevent the memorization of hard noisy samples and better learn the hard clean samples, thus enhancing the learning for hard samples. Empirical studies show that samples’ learning risk can identify noisy samples more accurately, and the RLA process can enhance the learning for hard samples. To evaluate the effectiveness of our method, we compare it with popular existing LNL methods on CIFAR-10, CIFAR-100, Animal-10N, Clothing1M, and Docred. The experimental results indicate that our method outperforms other existing methods. The source code for SLRLNL can be found at https://github.com/yangbo1973/SLRLNL.
KW - Deep neural networks
KW - Generalization error
KW - Learning risk
KW - Learning with noisy labels
UR - http://www.scopus.com/inward/record.url?scp=85186461423&partnerID=8YFLogxK
U2 - 10.1007/s40747-024-01360-z
DO - 10.1007/s40747-024-01360-z
M3 - Journal article
AN - SCOPUS:85186461423
JO - Complex and Intelligent Systems
JF - Complex and Intelligent Systems
SN - 2199-4536
ER -
ID: 385648232