Xiaoyan Jiang, Licheng Jiang, Anjie Wang, Kaiying Zhu, Yongbin Gao,CrackSegdiff:Diffusion Probability Model-based Multi-modal Crack Segmentation,2024

团队论文“CrackSegdiff:Diffusion Probability Model-based Multi-modal Crack Segmentation”IEEE国际机器人与自动化会议《IEEE International Conference on Robotics and Automation》 在投,祝贺!

Abstract:
Integrating grayscale and depth data in road inspection robots could enhance the accuracy, reliability, and comprehensiveness of road condition assessments, leading to improved maintenance strategies and safer infrastructure. However, these data sources are often compromised by significant background noise from the pavement. Recent advancements in Diffusion Probabilistic Models (DPM) have demonstrated remarkable success in image segmentation tasks, showcasing potent denoising capabilities, as evidenced in studies like SegDiff [1]. Despite these advancements, current DPM-based segmentors do not fully capitalize on the potential of original image data. In this paper, we propose a novel DPM-based approach for crack segmentation, named CrackSegDiff, which uniquely fuses grayscale and range/depth images. This method enhances the reverse diffusion process by intensifying the interaction between local feature extraction via DPM and global feature extraction. Unlike traditional methods that utilize Transformers for global features, our approach employs Vm-unet [2] to efficiently capture long-range information of the original data. The integration of features is further refined through two innovative modules: the Channel Fusion Module (CFM) and the Shallow Feature Compensation Module (SFCM). Our experimental evaluation on the three-class crack image segmentation tasks within the FIND dataset demonstrates that CrackSegDiff outperforms state-of-the-art methods, particularly excelling in the detection of shallow cracks. Code is available at https://github.com/sky-visionX/CrackSegDiff.

Download: [preprint版本]

Keywords: Channel Fusion Module, Diffusion Probabilistic Models, Shallow Feature Compensation Module
Photos:

Fan L,Chen W,Jiang X Cross-Correlation Fusion Graph Convolution-Based Object Tracking,*Symmetry* 2023

团队2019级研究生范柳伊同学的论文“Cross-Correlation Fusion Graph Convolution-Based Object Tracking”被期刊“Multidisciplinary Digital Publishing Institute Symmetry”录用,祝贺!

Abstract:
Most popular graph attention networks treat pixels of a feature map as individual nodes, which makes the feature embedding extracted by the graph convolution lack the integrity of the object. Moreover, matching between a template graph and a search graph using only part-level information usually causes tracking errors, especially in occlusion and similarity situations. To address these problems, we propose a novel end-to-end graph attention tracking framework that has high symmetry, combining traditional cross-correlation operations directly. By utilizing cross-correlation operations, we effectively compensate for the dispersion of graph nodes and enhance the representation of features. Additionally, our graph attention fusion model performs both part-to-part matching and global matching, allowing for more accurate information embedding in the template and search regions. Furthermore, we optimize the information embedding between the template and search branches to achieve better single-object tracking results, particularly in occlusion and similarity scenarios. The flexibility of graph nodes and the comprehensiveness of information embedding have brought significant performance improvements in our framework. Extensive experiments on three challenging public datasets (LaSOT, GOT-10k, and VOT2016) show that our tracker outperforms other state-of-the-art trackers.

Download: [官方链接]

Keywords: symmetry; single-object tracking; graph attention network; Siamese networks; cross-correlation; feature fusion

Photos:

Xiaoyan Jiang, J N Hwang and Z Fang, "A Multiscale Coarse-to-Fine Human Pose Estimation Network With Hard Keypoint Mining" in IEEE Transactions on Systems, Man, and Cybernetics:Systems, March 2024

团队负责人姜晓燕老师的论文“A Multiscale Coarse-to-Fine Human Pose Estimation Network With Hard Keypoint Mining” 被SCI期刊IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS接收,祝贺!

Abstract:
Current convolution neural network (CNN)-based multiperson pose estimators have achieved great progress, however, they pay no or less attention to “hard” samples, such as occluded keypoints, small and nearly invisible keypoints, and ambiguous keypoints. In this article, we explicitly deal with these “hard” samples by proposing a novel multiscale coarse-to-fine human pose estimation network (HM2PN), which includes two sequential subnetworks: CoarseNet and FineNet. CoarseNet conducts a coarse prediction to locate “simple” keypoints like hands and ankles with a multiscale fusion module, which is integrated with bottleneck, resulting in a novel module called multiscale bottleneck. The new module improves the multiscale representation ability of the network in a fine-grained level, while marginally reducing the computation cost because of group convolution. FineNet further infers “hard” keypoints and refines “simple” keypoints simultaneously with a hard keypoint mining loss. Distinct from the previous works, the proposed loss deals with “hard” keypoints differentially and prevents “simple” keypoints from dominating the computed gradients during training. Experiments on the COCO keypoint benchmark show that our approach achieves superior pose estimation performance compared with other state-of-the-art methods.

Download: [preprint版本]

Keywords: Hard sample mining, human pose estimation,multiscale

Photos:

Kunlun Xue, Xiaoyan Jiang, Zhichao Chen“A SLAM Method Based on ORB-SLAM3 Which Mixed GNSS Data” International Conference on Information Technologies and Electrical Engineering

团队2021级研究生薛昆仑同学的论文“A SLAM Method Based on ORB-SLAM3 Which Mixed GNSS Data”被“In 6th International Conference on Information Technologies and Electrical Engineering”录用,祝贺!

Abstract:
Traditional single-sensor SLAM methods suffer from cumulative drift errors in large-scale outdoor environments, which makes it difficult to have good localization accuracy in practical application scenarios. In this paper, to solve the above problems, we propose a visual inertial system fusion method with global navigation satellite system (GNSS), which transforms GNSS measurements into values in Cartesian coordinate system, and then uses odometry pose information and GNSS information to do nonlinear optimization to eliminate the cumulative drift error within the system, and experiments are carried out on the KITTI raw data, which show that the method proposed in this paper effectively improves the localization accuracy in large-scale outdoor environments. The results show that the method proposed in this paper effectively improves the localization accuracy in outdoor large-scale scenarios, and the localization accuracy on the KITTI dataset is 54% higher than that of ORB-SLAM3 on average.

Download: [官方链接]

Keywords: Simultaneous localization and mapping, Multi-source mixed, Automatic driving, Nonlinear optimization
Photos:

Wenwen Zheng,Xiaoyan Jiang, Zhijun Fang etc, "TV-Net:A Structure-Level Feature Fusion Network Based on Tensor Voting for Road Crack Segmentation" in IEEE Transactions on Intelligent Transportation Systems, June 2024

团队2021级研究生郑雯雯同学的论文“TV-Net: A Structure-Level Feature Fusion NetworkBased on Tensor Voting for RoadCrack Segmentation”被SCI顶刊《IEEE Transactions on Intelligent Transportation Systems》 录用,祝贺!

Abstract:
Pavement cracks are a common and significant problem for intelligent pavement maintainment. However, the features extracted in pavement images are often texture-less, and noise interference can be high. Segmentation using traditional convolutional neural network training can lose feature information when the network depth goes larger, which makes accurate prediction a challenging topic. To address these issues, we propose a new approach that features an enhanced tensor voting module and a customized pixel-level pavement crack segmentation network structure, called TV-Net. We optimize the tensor voting framework and find the relationship between tensor scale factors and crack distributions. A tensor voting fusion module is introduced to enhance feature maps by incorporating significant domain maps generated by tensor voting. Additionally, we propose a structural consistency loss function to improve segmentation accuracy and ensure consistency with the structural characteristics of the cracks obtained through tensor voting. The sufficient experimental analysis demonstrates that our method outperforms existing mainstream pixel-level segmentation networks on the same road crack dataset. Our proposed TV-Net has an excellent performance in avoiding noise interference and strengthening the structure of the fracture site of pavement cracks.

Download: [官方链接] [preprint版本]

Keywords: Crack detection convolutional neural network tensor voting U-Net

Photos: