导师简介

姜晓燕，副教授、硕导，博士毕业于耶拿大学（德国）计算机科学专业。

Xiaoyan Jiang is an Associate Professor at Shanghai University of Engineering Science since 2020.01. She is a Visiting Scholar in Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, Netherlands. She got her doctor's degree in computer science from Friedrich-Schiller University Jena, Germany in 2015. She has published 60 publications in the field of computer vision and artificial intelligence. She serves as the Associate Editior of Applied Intelligence (IF: 3.9) since 2024.

讲授课程 Courses：计算机视觉 Computer Vision、机器学习 Machine Learning、数字图像处理 Digital Image Processing、面向对象程序设计 C++
研究课题 Research Topics: 计算机视觉 Computer Vision、机器学习 Machine Learning、大语言模型推理及应用 Reasoning and Application of Large Language Models
链接 Links：谷歌学术 Google Scholar GitHub网页荷兰莱顿大学访问学者页面 Homepage Leiden University
邮箱 E-mail：xiaoyan.jiang@sues.edu.cn

研究课题为计算机视觉、深度学习，应用领域包括视频监控、医疗辅助分析、工业检测、场景理解、智能交通等。在计算机视觉、人工智能领域发表论文60余篇，其中SCI/EI 50余篇，包括Trans. SMC, Trans. ITS, Pattern Recognition, Knowledge-Based Systems (KBS), SPIC, ICIP, ICONIP, ICME等。为多个顶级国际会议与期刊的评审。ICPCSEE2019、IEA/AIE2023的项目委员会成员、在CiSE2023，IWITC2021，ICFTIC2019国际会议做主旨报告。担任Applied Intelligence期刊副主编一职。曾获德国DAAD、中国政府奖学金CSC资助。学校青年五四奖章集体成员、上海市长宁区第四轮创新团队–智能视觉感知与信息处理创新团队核心成员、学院优秀教师。主持/参与国家自然科学基金青年项目、民航重点、面上、上海市教委项目、上海市科委重点项目、上海飞机制造有限公司项目等多项。申请发明专利八项，实用新型专利五项、软著多项。

现为电子与电气工程学院多维度人工智能科研团队负责人，团队注重科技研发的同时，积极推进人工智能技术的产业化。与各行业企业开展产学研合作，以5G + AI为未来模式，已在智能交通、三维场景建模、视频监控、瑕疵巡检、智慧医疗等方面取得多项成果，并应用到实际场景中。研究涵盖计算机视觉领域的多个课题：多目标跟踪、域自适应行人重识别、语义分割、视觉SLAM。所负责的项目应用到的领域：智能交通、视频监控、大型客机表面喷漆瑕疵检测、工业缺陷检测、胃癌淋巴结转移检测、眼震疾病诊断、心脏周期分析等。

团队以学生发展为中心，打牢从传统视觉算法到深度学习及大模型相关的关键知识与理论，结合实际场景，培养学生独立思考，发现问题和解决问题的能力。目标为激发大家持续终身学习的内驱力，最终团队得到成长和发展！如果你对自我有要求，对科学好奇，愿意为解决问题而努力，那本团队适合你，欢迎加入！

12月 31 Team 评论

Xiaoyan Jiang, Bohan Wang, Xinlong Wan, Shanshan Chen, H. Fujita*, and Hanan Abd. Al Juaid,2025

论文”Project-and-Fuse: Improving RGB-D Semantic Segmentation via Graph Convolution Networks”被2025 《Information Sciences》接受，祝贺！

Abstract:
Most existing RGB-D semantic segmentation methods focus on the feature level fusion, including complex cross-modality and cross-scale fusion modules. However, these methods may cause misalignment problem in the feature fusion process and counter-intuitive patches in the segmentation results. Inspired by the popular pixel-node-pixel pipeline, we propose to 1、fuse features from two modalities in a late fusion style, during which the geometric feature injection is guided by texture feature prior; 2、employ Graph Neural Networks (GNNs) on the fused feature to alleviate the emergence of irregular patches by inferring patch relationship. At the 3D feature extraction stage, we argue that traditional CNNs are not efficient enough for depth maps. So, we encode depth map into normal map, after which CNNs can easily extract object surface this http URL projection matrix generation stage, we find the existence of Biased-Assignment and Ambiguous-Locality issues in the original pipeline. Therefore, we propose to adopt the Kullback-Leibler Loss to ensure no missing important pixel features, which can be viewed as hard pixel mining process; connect regions that are close to each other in the Euclidean space as well as in the semantic space with larger edge weights so that location informations can been considered. Extensive experiments on two public datasets, NYU-DepthV2 and SUN RGB-D, have shown that our approach can consistently boost the performance of RGB-D semantic segmentation task.

Download: [preprint版本]

Keywords: RGB-D, semantic segmentation, GCN, late fusion
Photos:

9月 3 AcademicActivities 论文 Article 评论

面向未来的工科AI课程教学改革：融合大模型的路径探索与教师应对策略

8月 5 AcademicActivities 论文 Article 评论

Xiaoyan Jiang, Zhi Zhou, Hailing Wang, Guozhong Wang, and Zhijun Fang,2025

团队论文“TexLiverNet: Leveraging Medical Knowledge and Spatial-Frequency Perception for Enhanced Liver Tumor Segmentation”2025IEEE医疗顶会《IEEE International Symposium on Biomedical Imaging》已接收，祝贺！

Abstract:
Integrating textual data with imaging in liver tumor segmentation is essential for enhancing diagnostic accuracy. However, current multi-modal medical datasets offer only general text annotations, lacking lesion-specific details critical for extracting nuanced features, especially for fine-grained segmentation of tumor boundaries and small lesions. To address these limitations, we developed datasets with lesion-specific text annotations for liver tumors and introduced the TexLiverNet model. TexLiverNet employs an agent-based cross-attention module that integrates text features efficiently with visual features, significantly reducing computational costs. Additionally, enhanced spatial and adaptive frequency domain perception is proposed to precisely delineate lesion boundaries, reduce background interference, and recover fine details in small lesions. Comprehensive evaluations on public and private datasets demonstrate that TexLiverNet achieves superior performance compared to current state-of-the-art methods.

Download: [preprint版本]

Keywords: PTQ，Referring Image Segmentation
Photos:

1月 7 AcademicActivities 医学图像 Medical Image, 多模态 CLIP, 论文 Article 评论

Xiaoyan Jiang, Hang Yang, Kaiying Zhu, Xihe Qiu, Shibo Zhao, Sifan Zhou,2024

团队论文“PTQ4RIS: Post-Training Quantization for Referring Image Segmentation”被2025IEEE国际机器人与自动化会议《IEEE International Conference on Robotics and Automation》接收，祝贺！

Abstract:
Referring Image Segmentation (RIS), aims to segment the object referred by a given sentence in an image by understanding both visual and linguistic information. However, existing RIS methods tend to explore top-performance models, disregarding considerations for practical applications on resources-limited edge devices. This oversight poses a significant challenge for on-device RIS inference. To this end, we propose an effective and efficient post-training quantization framework termed PTQ4RIS. Specifically, we first conduct an in-depth analysis of the root causes of performance degradation in RIS model quantization and propose dual-region quantization (DRQ) and reorder-based outlier-retained quantization (RORQ) to address the quantization difficulties in visual and text encoders. Extensive experiments on three benchmarks with different bits settings (from 8 to 4 bits) demonstrates its superior performance. Importantly, we are the first PTQ method specifically designed for the RIS task, highlighting the feasibility of PTQ in RIS applications.

Download: [preprint版本]

Keywords: PTQ，Referring Image Segmentation
Photos:

10月 20 AcademicActivities 论文 Article 评论

多维度计算机视觉团队

Multi-Dimensional Computer Vision Team

导师简介

Xiaoyan Jiang, Bohan Wang, Xinlong Wan, Shanshan Chen, H. Fujita*, and Hanan Abd. Al Juaid,2025

面向未来的工科AI课程教学改革：融合大模型的路径探索与教师应对策略

Xiaoyan Jiang, Zhi Zhou, Hailing Wang, Guozhong Wang, and Zhijun Fang,2025

Xiaoyan Jiang, Hang Yang, Kaiying Zhu, Xihe Qiu, Shibo Zhao, Sifan Zhou,2024