Context Transformer and Adaptive Method with Visual Transformer for Robust Facial Expression Recognition

Xiong, Lingxin and Zhang, Jicun and Zheng, Xiaojia and Wang, Yuxin (2024) Context Transformer and Adaptive Method with Visual Transformer for Robust Facial Expression Recognition. Applied Sciences, 14 (4). p. 1535. ISSN 2076-3417

Text
applsci-14-01535.pdf - Published Version
Download (3MB)

Official URL: https://doi.org/10.3390/app14041535

Abstract

In real-world scenarios, the facial expression recognition task faces several challenges, including lighting variations, image noise, face occlusion, and other factors, which limit the performance of existing models in dealing with complex situations. To cope with these problems, we introduce the CoT module between the CNN and ViT frameworks, which improves the ability to perceive subtle differences by learning the correlations between local area features at a fine-grained level, helping to maintain the consistency between the local area features and the global expression, and making the model more adaptable to complex lighting conditions. Meanwhile, we adopt an adaptive learning method to effectively eliminate the interference of noise and occlusion by dynamically adjusting the parameters of the Transformer Encoder’s self-attention weight matrix. Experiments demonstrate the accuracy of our CoT_AdaViT model in the Oulu-CASIA dataset as (NIR: 87.94%, VL: strong: 89.47%, weak: 84.76%, dark: 82.28%). As well as, CK+, RAF-DB, and FERPlus datasets achieved 99.20%, 91.07%, and 90.57% recognition results, which achieved excellent performance and verified that the model has strong recognition accuracy and robustness in complex scenes.

Item Type:	Article
Subjects:	Science Global Plos > Multidisciplinary
Depositing User:	Unnamed user with email support@science.globalplos.com
Date Deposited:	15 Feb 2024 05:28
Last Modified:	15 Feb 2024 05:28
URI:	http://ebooks.manu2sent.com/id/eprint/2496

Actions (login required)

: View Item