Jiang, G., Jiang, X., Fang, Z. et al. An efficient attention module for 3d convolutional neural networks in action recognition. Appl Intell

团队2018级研究生蒋光好同学的论文“An efficient attention module for 3d convolutional neural networks in action recognition”被SCI期刊《Applied Intelligence》 录用,祝贺!

Abstract:
Due to illumination changes, varying postures, and occlusion, accurately recognizing actions in videos is still a challenging task. A three-dimensional convolutional neural network (3D CNN), which can simultaneously extract spatio-temporal features from sequences, is one of the mainstream models for action recognition. However, most of the existing 3D CNN models ignore the importance of individual frames and spatial regions when recognizing actions. To address this problem, we propose an efficient attention module (EAM) that contains two sub-modules, that is, a spatial efficient attention module (EAM-S) and a temporal efficient attention module (EAM-T). Specifically, without dimensionality reduction, EAM-S concentrates on mining category-based correlation by local cross-channel interaction and assigns high weights to important image regions, while EAM-T estimates the importance score of different frames by cross-frame interaction between each frame and its neighbors. The proposed EAM module is lightweight yet effective, and it can be easily embedded into 3D CNN-based action recognition models. Extensive experiments on the challenging HMDB-51 and UCF-101 datasets showed that our proposed module achieves state-of-the-art performance and can significantly improve the recognition accuracy of 3D CNN-based action recognition methods.

Download: [官方链接] [preprint版本]

Keywords: Effective attention module, 3D CNN, Deep learning, Action recognition.

Photos:

Kaiying Zhu, Xiaoyan Jiang, Zhijun Fang, Yongbin Gao etc, Jenq-Neng Hwang,Photometric transfer for direct visual odometry,Knowledge-Based Systems,2021

团队2018级研究生朱凯赢同学的论文“Photometric transfer for direct visual odometry”被SCI期刊《Knowledge-Based Systems》 录用,祝贺!

Abstract:
Due to efficient photometric information utilization, direct visual odometry (DVO) is getting widely used to estimate the ego-motion of moving cameras as well as map the environment from videos simultaneously, especially in challenging weak-texture scenarios. However, DVO suffers from brightness discrepancies since it directly utilizes intensity patterns of pixels to register frames for camera pose estimation. Most existing brightness transfer methods build a fixed transfer function which is inappropriate for successive and inconsistent brightness changes in practice. To overcome this problem, we propose a Photometric Transfer Net (PTNet) which is trained to pixel-wisely remove brightness discrepancies between two frames without ruining the context information. Photometric consistency in DVO is obtained by adjusting the source frame according to the reference frame. Since no dataset is available for training the photometric transfer model, we augment the EuRoC dataset by generating a certain number of frames with different brightness levels for each original frame by a nonlinear transformation. Afterwards, required training data containing various brightness changes and scene movements along with ground truth can be collected from the extended sequences. Evaluations on both real-world and synthetic datasets demonstrate the effectiveness of the proposed model. Assessment on an unseen dataset with fixed model parameters trained on another dataset proves the generalization ability of the model. Furthermore, we embed the model into DVO to preprocess input data with brightness discrepancies. Experimental results show that PTNet-based DVO achieves more robust initialization and accurate pose estimation than the original one.

Download: [官方链接] [preprint版本]

Keywords: Photometric transfer, Direct visual odometry, Data augmentation, Brightness discrepancy, Deep learning.

Photos:

Yang Li, Xiaoyan Jiang, Jenq-Neng Hwang,Effective person re-identification by self-attention model guided feature learning,Knowledge-Based Systems

团队2017级研究生黎阳同学的论文“Effective Person Re-identification by Self-Attention Model Guided Feature Learning”被SCI期刊《Knowledge-Based Systems》 录用,祝贺!

Abstract:
Person re-identification (re-ID), of which the goal is to recognize person identities of images captured by non-overlapping cameras, is a challenging topic in computer vision. Most existing person re-ID methods conduct directly on detected objects, which ignore the space misalignment caused by detectors, human pose variation, and occlusion problems. To tackle the above mentioned difficulties, we propose a self-attention model guided deep convolutional neural network(DCNN) to learn robust features from image shots. Kernels of the self-attention model evaluate weights for the importance of different person regions. To solve the local feature dependence problem of feature extraction, the non-local feature map generated by the self-attention model is fused with the original feature map generated from the resnet-50. Furthermore, the loss function considers both the cross-entropy loss and the triplet loss in the training process, which enables the network to capture common characteristics within the same individuals and significant differences between distinct persons. Extensive experiments and comparative evaluations show that our proposed strategy outperforms most of the state-of-the-art methods on standard datasets: Market-1501, DukeMTMC-reID, and CUHK03.

Download: [官方链接] [preprint版本]

Keywords: Personre-identification, Feature extraction, Self-attention, Cross-entropy loss, Triplet loss.

Photos: