Xiaoyan Jiang, Xinlong Wan, Kaiying Zhu, Xihe Qiu, Zhijun Fang Distribution-aware Noisy-label Crack Segmentation,2024

团队论文“Distribution-aware Noisy-label Crack Segmentation”IEEE国际机器人与自动化会议《IEEE International Conference on Robotics and Automation》

Abstract:
Road crack segmentation is critical for robotic systems tasked with the inspection, maintenance, and monitoring of road infrastructures. Existing deep learning-based methods for crack segmentation are typically trained on specific datasets, which can lead to significant performance degradation when applied to unseen real-world scenarios. To address this, we introduce the SAM-Adapter, which incorporates the general knowledge of the Segment Anything Model (SAM) into crack segmentation, demonstrating enhanced performance and generalization capabilities. However, the effectiveness of the SAM-Adapter is constrained by noisy labels within small-scale training sets, including omissions and mislabeling of cracks. In this paper, we present an innovative joint learning framework that utilizes distribution-aware domain-specific semantic knowledge to guide the discriminative learning process of the SAM-Adapter. To our knowledge, this is the first approach that effectively minimizes the adverse effects of noisy labels on the supervised learning of the SAM-Adapter. Our experimental results on two public pavement crack segmentation datasets confirm that our method significantly outperforms existing state-of-the-art techniques. Furthermore, evaluations on the completely unseen CFD dataset demonstrate the high cross-domain generalization capability of our model, underscoring its potential for practical applications in crack segmentation.

Download: [preprint版本]

Keywords: SAM-Adapter,Crack Detection
Photos:

Xiaoyan Jiang, Licheng Jiang, Anjie Wang, Kaiying Zhu, Yongbin Gao,CrackSegdiff:Diffusion Probability Model-based Multi-modal Crack Segmentation,2024

团队论文“CrackSegdiff:Diffusion Probability Model-based Multi-modal Crack Segmentation”IEEE国际机器人与自动化会议《IEEE International Conference on Robotics and Automation》

Abstract:
Integrating grayscale and depth data in road inspection robots could enhance the accuracy, reliability, and comprehensiveness of road condition assessments, leading to improved maintenance strategies and safer infrastructure. However, these data sources are often compromised by significant background noise from the pavement. Recent advancements in Diffusion Probabilistic Models (DPM) have demonstrated remarkable success in image segmentation tasks, showcasing potent denoising capabilities, as evidenced in studies like SegDiff [1]. Despite these advancements, current DPM-based segmentors do not fully capitalize on the potential of original image data. In this paper, we propose a novel DPM-based approach for crack segmentation, named CrackSegDiff, which uniquely fuses grayscale and range/depth images. This method enhances the reverse diffusion process by intensifying the interaction between local feature extraction via DPM and global feature extraction. Unlike traditional methods that utilize Transformers for global features, our approach employs Vm-unet [2] to efficiently capture long-range information of the original data. The integration of features is further refined through two innovative modules: the Channel Fusion Module (CFM) and the Shallow Feature Compensation Module (SFCM). Our experimental evaluation on the three-class crack image segmentation tasks within the FIND dataset demonstrates that CrackSegDiff outperforms state-of-the-art methods, particularly excelling in the detection of shallow cracks. Code is available at https://github.com/sky-visionX/CrackSegDiff.

Download: [preprint版本]

Keywords: Channel Fusion Module, Diffusion Probabilistic Models, Shallow Feature Compensation Module
Photos:

Fan L,Chen W,Jiang X Cross-Correlation Fusion Graph Convolution-Based Object Tracking,*Symmetry* 2023

团队2019级研究生范柳伊同学的论文“Cross-Correlation Fusion Graph Convolution-Based Object Tracking”被期刊“Multidisciplinary Digital Publishing Institute Symmetry”录用,祝贺!

Abstract:
Most popular graph attention networks treat pixels of a feature map as individual nodes, which makes the feature embedding extracted by the graph convolution lack the integrity of the object. Moreover, matching between a template graph and a search graph using only part-level information usually causes tracking errors, especially in occlusion and similarity situations. To address these problems, we propose a novel end-to-end graph attention tracking framework that has high symmetry, combining traditional cross-correlation operations directly. By utilizing cross-correlation operations, we effectively compensate for the dispersion of graph nodes and enhance the representation of features. Additionally, our graph attention fusion model performs both part-to-part matching and global matching, allowing for more accurate information embedding in the template and search regions. Furthermore, we optimize the information embedding between the template and search branches to achieve better single-object tracking results, particularly in occlusion and similarity scenarios. The flexibility of graph nodes and the comprehensiveness of information embedding have brought significant performance improvements in our framework. Extensive experiments on three challenging public datasets (LaSOT, GOT-10k, and VOT2016) show that our tracker outperforms other state-of-the-art trackers.

Download: [官方链接]

Keywords: symmetry; single-object tracking; graph attention network; Siamese networks; cross-correlation; feature fusion

Photos:

Xiaoyan Jiang, J N Hwang and Z Fang, "A Multiscale Coarse-to-Fine Human Pose Estimation Network With Hard Keypoint Mining" in IEEE Transactions on Systems, Man, and Cybernetics:Systems, March 2024

团队负责人姜晓燕老师的论文“A Multiscale Coarse-to-Fine Human Pose Estimation Network With Hard Keypoint Mining” 被SCI期刊IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS接收,祝贺!

Abstract:
Current convolution neural network (CNN)-based multiperson pose estimators have achieved great progress, however, they pay no or less attention to “hard” samples, such as occluded keypoints, small and nearly invisible keypoints, and ambiguous keypoints. In this article, we explicitly deal with these “hard” samples by proposing a novel multiscale coarse-to-fine human pose estimation network (HM2PN), which includes two sequential subnetworks: CoarseNet and FineNet. CoarseNet conducts a coarse prediction to locate “simple” keypoints like hands and ankles with a multiscale fusion module, which is integrated with bottleneck, resulting in a novel module called multiscale bottleneck. The new module improves the multiscale representation ability of the network in a fine-grained level, while marginally reducing the computation cost because of group convolution. FineNet further infers “hard” keypoints and refines “simple” keypoints simultaneously with a hard keypoint mining loss. Distinct from the previous works, the proposed loss deals with “hard” keypoints differentially and prevents “simple” keypoints from dominating the computed gradients during training. Experiments on the COCO keypoint benchmark show that our approach achieves superior pose estimation performance compared with other state-of-the-art methods.

Download: [preprint版本]

Keywords: Hard sample mining, human pose estimation,multiscale

Photos:

Kunlun Xue, Xiaoyan Jiang, Zhichao Chen“A SLAM Method Based on ORB-SLAM3 Which Mixed GNSS Data” International Conference on Information Technologies and Electrical Engineering

团队2021级研究生薛昆仑同学的论文“A SLAM Method Based on ORB-SLAM3 Which Mixed GNSS Data”被“In 6th International Conference on Information Technologies and Electrical Engineering”录用,祝贺!

Abstract:
Traditional single-sensor SLAM methods suffer from cumulative drift errors in large-scale outdoor environments, which makes it difficult to have good localization accuracy in practical application scenarios. In this paper, to solve the above problems, we propose a visual inertial system fusion method with global navigation satellite system (GNSS), which transforms GNSS measurements into values in Cartesian coordinate system, and then uses odometry pose information and GNSS information to do nonlinear optimization to eliminate the cumulative drift error within the system, and experiments are carried out on the KITTI raw data, which show that the method proposed in this paper effectively improves the localization accuracy in large-scale outdoor environments. The results show that the method proposed in this paper effectively improves the localization accuracy in outdoor large-scale scenarios, and the localization accuracy on the KITTI dataset is 54% higher than that of ORB-SLAM3 on average.

Download: [官方链接]

Keywords: Simultaneous localization and mapping, Multi-source mixed, Automatic driving, Nonlinear optimization
Photos: