Wenwen Zheng,Xiaoyan Jiang, Zhijun Fang etc, "TV-Net:A Structure-Level Feature Fusion Network Based on Tensor Voting for Road Crack Segmentation" in IEEE Transactions on Intelligent Transportation Systems, June 2024

团队2021级研究生郑雯雯同学的论文“TV-Net: A Structure-Level Feature Fusion NetworkBased on Tensor Voting for RoadCrack Segmentation”被SCI顶刊《IEEE Transactions on Intelligent Transportation Systems》 录用,祝贺!

Abstract:
Pavement cracks are a common and significant problem for intelligent pavement maintainment. However, the features extracted in pavement images are often texture-less, and noise interference can be high. Segmentation using traditional convolutional neural network training can lose feature information when the network depth goes larger, which makes accurate prediction a challenging topic. To address these issues, we propose a new approach that features an enhanced tensor voting module and a customized pixel-level pavement crack segmentation network structure, called TV-Net. We optimize the tensor voting framework and find the relationship between tensor scale factors and crack distributions. A tensor voting fusion module is introduced to enhance feature maps by incorporating significant domain maps generated by tensor voting. Additionally, we propose a structural consistency loss function to improve segmentation accuracy and ensure consistency with the structural characteristics of the cracks obtained through tensor voting. The sufficient experimental analysis demonstrates that our method outperforms existing mainstream pixel-level segmentation networks on the same road crack dataset. Our proposed TV-Net has an excellent performance in avoiding noise interference and strengthening the structure of the fracture site of pavement cracks.

Download: [官方链接] [preprint版本]

Keywords: Crack detection convolutional neural network tensor voting U-Net

Photos:

Baihong Han, Xiaoyan Jiang, Zhijun Fang, Hamido Fujita, Yongbin Gao,F-SCP:An automatic prompt generation method for specific classes based on visual language pre-training models,*Pattern Recognition*,2024

团队2021级研究生韩柏宏同学的论文“F-SCP: An automatic prompt generation method for specific classes based onvisual language pre-training models”被SCI顶刊《Parttern Recognition》 录用,祝贺!

Abstract:
The zero-shot classification performance of large-scale vision-language pre-training models (e.g., CLIP, BLIP and ALIGN) can be enhanced by incorporating a prompt (e.g., “a photo of a [CLASS]”) before the class words. Modifying the prompt slightly can have significant effect on the classification outcomes of these models. Thus, it is crucial to include an appropriate prompt tailored to the classes. However, manual prompt design is labor-intensive and necessitates domain-specific expertise. The CoOp (Context Optimization) converts hand-crafted prompt templates into learnable word vectors to automatically generate prompts, resulting in substantial improvements for CLIP. However, CoOp exhibited significant variation in classification performance across different classes. Although CoOp-CSC (Class-Specific Context) has a separate prompt for each class, only shows some advantages on fine-grained datasets. In this paper, we propose a novel automatic prompt generation method called F-SCP (Filter-based Specific Class Prompt), which distinguishes itself from the CoOp-UC (Unified Context) model and the CoOp-CSC model. Our approach focuses on prompt generation for low-accuracy classes and similar classes. We add the Filter and SCP modules to the prompt generation architecture. The Filter module selects the poorly classified classes, and then reproduce the prompts through the SCP (Specific Class Prompt) module to replace the prompts of specific classes. Experimental results on six multi-domain datasets shows the superiority of our approach over the state-of-the-art methods. Particularly, the improvement in accuracy for the specific classes mentioned above is significant. For instance, compared with CoOp-UC on the OxfordPets dataset, the low-accuracy classes, such as, Class21 and Class26, are improved by 18% and 12%, respectively.

Download: [官方链接] [preprint版本]

Keywords: Multi-modalVision language modelPrompt tuningLarge-scale pre-training model
Photos:

Yi Wu, Xiaoyan Jiang, Zhijun Fang, Yongbin Gao, Hamido Fujita,Multi-modal 3D object detection by 2D-guided precision anchor proposal and multi-layer fusion,Applied Soft Computing,2021

团队2018级研究生吴益同学的论文“Multi-modal 3D object detection by 2D-guided precision anchor proposal and multi-layer fusion”被SCI期刊《Applied Soft Computing》 录用,祝贺!

Abstract:
3D object detection, of which the goal is to obtain the 3D spatial structure information of the object, is a challenging topic in many visual perception systems, e.g., autonomous driving, augmented reality, and robot navigation. Most existing region proposal network (RPN) based 3D object detection methods generate anchors in the whole 3D searching space without using semantic information, which leads to the problem of inappropriate anchor size generation. To tackle the issue, we propose a 2D-guided precision anchor generation network (PAG-Net). Specifically speaking, we utilize a mature 2D detector to get 2D bounding boxes and category labels of objects as prior information. Then the 2D bounding boxes are projected into 3D frustum space for more precise and category-adaptive 3D anchors. Furthermore, current feature combination methods are early fusion, late fusion, and deep fusion, which only fuse features from high convolutional layers and ignore the data missing problem of point clouds. To obtain more efficient fusion of RGB images and point clouds features, we propose a multi-layer fusion model, which conducts nonlinear and iterative combinations of features from multiple convolutional layers and merges the global and local features effectively. We encode point cloud with the bird’s eye view (BEV) representation to solve the irregularity of point cloud. Experimental results show that our proposed approach improves the baseline by a large margin and outperforms most of the state-of-the-art methods on the KITTI object detection benchmark.

Download: [官方链接] [preprint版本]

Keywords: 3D object detection, Multi-modal, Autonomous driving, Feature fusion, Point cloud.

Photos: