I am a last-year Ph.D. student in the Successive Postgraduate and Doctoral Program at the CFM lab of the University of Electronic Science and Technology of China (UESTC), supervised by Prof. Heng Tao Shen, co-supervised by Prof. Fumin Shen and Prof. Xing Xu. I am also a visiting Ph.D. student of MReal Lab at Nanyang Technological University under the supervision of Prof. Hanwang Zhang. Before that, I earned my bachelor's degree in Software Engineering from UESTC in 2020, where I was recognized as an Honor Graduate.
In the past five years, I have published more than 20 papers on AI/Multimedia flagship conferences and ACM/IEEE transaction journals, including IEEE TPAMI, TMM, TNNLS, TCSVT, TFS, CVPR, ACM MM, AAAI, etc. I am also serving as a journal reviewer for IEEE TPAMI, TIP, TMM, TCSVT, ACM TOIS, TOMM, etc., as well as a Program Committee member for CVPR, ICCV, ECCV, ACM MM, AAAI, WWW, ICASSP, BMVC, etc..
Currently, I'm working on multimodal learning, spatial intelligence (particularly from an egocentric view), and VLM/LLM reasoning enhancement. I am also highly interested in aerial and embodied multimodal perception and understanding. Feel free to contact me for discussion and collaboration!
Email: xun_jiang@outlook.com
Google Scholar  /  GitHub
|
Dec. 2024 - Dec. 2025 |
Nanyang Technological University (NTU), Singapore
|
|
Dec. 2020 - Present |
University of Electronic Science and Technology of China (UESTC), China
|
|
Sep. 2016 - Jun. 2020 |
University of Electronic Science and Technology of China (UESTC), China
|
|
|
Multimodal Learning on Low-Quality Data with Conformal Predictive Self-Calibration Xun Jiang, Yufan Gu, Disen Hu, Yuqing Hou, Yazhou Yao, Fumin Shen, Heng Tao Shen, Xing Xu IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2026 [Paperlink], [Code] Key Words: Multimodal Learning, Low-quality Multimodal Data, Conformal Prediction |
|
|
Tracking Environmental Noise for Low-altitude Aircraft with Multimodal Acoustic Field Synthesis Qichen Tan, Kexin Sun, Xun Jiang#(# Corresponding Author) IEEE/CVF Conference on Computer Vision and Pattern Recognition Findings, CVPR Findings 2026 [Paperlink], [Repo] Key Words: UAV Acoustic Field, Multimedia Synthesis, Scientific Visualization |
|
|
Generalizable Egocentric Task Verification Via Cross-Modal Hybrid Hypergraph Matching
Xun Jiang, Xing Xu, Zheng Wang, Jingkuan Song, Fumin Shen, Heng Tao Shen IEEE Transactions on Pattern Analysis and Machine Intelligence, TPAMI 2026 [Paperlink], [Repo] Key Words: Egocentric Vision, Cross-modal Task Verification, Hypergraph Learning |
|
|
Hyper-Opinion Vagueness Quantification for Robust Multimodal Learning Disen Hu, Xun Jiang, Xiaofeng Cao, Zheng Wang, Jingkuan Song, Heng Tao Shen, Xing Xu AAAI Conference on Artificial Intelligence , AAAI 2026 [Paperlink], [Code] Key Words: Multimodal Learning, Robust Multimodal Learning, Hyper-Opinion Vagueness |
|
|
Geometric Gradient Divergence Modulation for Imbalanced Multimodal Learning Disen Hu, Xun Jiang, Zhe Sun, Hao Yang, Chong Peng, Peng Yan, Heng Tao Shen, Xing Xu ACM Internation Conference on Multimedia, ACM MM 2025 [Paperlink], [Code] Key Words: Multimodal Learning, Imbalanced Multimodal Learning, Hyperspace Polyhedron |
|
Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos Xun Jiang, Zhiyi Huang, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 [Paperlink], [Code] Key Words: Natural Language-based Egocentric Task Verification; Heterogeneous Graph Completion; Multimodal Learning; Procedural Task Understanding |
|
|
Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding Xun Jiang, Zhuoyuan Wei, Shenshen Li, Xing Xu, Jingkuan Song, Heng Tao Shen ACM Internation Conference on Multimedia, ACM MM 2024 [Paperlink], [Code] Key Words: Video Content Understanding, De-biased Video Grounding; Counterfactual Reasoning; Multimodal Learning |
|
Zero-Shot Video Moment Retrieval with Angular Reconstructive Text Embeddings Xun Jiang, Xing Xu, Zailei Zhou, Yang Yang, Fumin Shen, Heng Tao Shen IEEE Transactions on Multimedia, TMM 2024 [Paperlink], [Code] Key Words: Video Content Understanding; Weakly-Supervised Learning; CLIP |
|
Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion Zixian Gao*, Xun Jiang* (* Equal Contribution), Xing Xu, Fumin Shen, Yujie Li, Heng Tao Shen IEEE/CVF Computer Vision and Pattern Recognition Conference, CVPR 2024 [Paperlink], [Code] Key Words: Multimodal Learning; Model Robustness; Uncertainty in Deep Learning |
|
Joint Searching and Grounding: Multi-Granularity Video Content Retrieval Zhiguo Chen*, Xun Jiang* (* Equal Contribution), Xing Xu, Zuo Cao, Yijun Mo, Heng Tao Shen ACM Internation Conference on Multimedia, ACM MM 2023 [Paperlink], [Code] Key Words: Multimedia Retrieval; Video Content Understanding; Multimodal Learning |
|
Multi-Grained Attention Network with Mutual Exclusion for Composed Query-Based Image Retrieval Shenshen Li, Xing Xu, Xun Jiang, Fumin Shen, Xin Liu, Heng Tao Shen IEEE Transactions on Circuits and Systems for Video Technology, TCSVT 2023 [Paperlink], [Code] Key Words: Cross-modal Retrieval; Composed Query-Based Image Retrieval; Multimodal Learning |
|
Faster Video Moment Retrieval with Point-Level Supervision Xun Jiang, Zailei Zhou, Xing Xu, Yang Yang, Guoqing Wang, Heng Tao Shen ACM Internation Conference on Multimedia, ACM MM 2023 [Paperlink], [Code] Key Words: Video Content Retrieval; Point-level Supervision; Retrieval Efficiency |
|
SDN: Semantic Decoupling Network for Temporal Language Grounding Xun Jiang, Xing Xu, Jingran Zhang, Fumin Shen, Zuo Cao, Heng Tao Shen IEEE Transactions on Neural Networks and Learning Systems, TNNLS 2022 [Paperlink], [Code] Key Words: Video Content Understanding; Vision-Language; Multimodal Learning |
|
DHHN: Dual Hierarchical Hybrid Network for Weakly-Supervised Audio-Visual Video Parsing Xun Jiang, Xing Xu, Zhiguo Chen, Jingran Zhang, Jingkuan Song, Fumin Shen, Huimin Lu, Heng Tao Shen, ACM Internation Conference on Multimedia, ACM MM 2022 [Paperlink], [Code] Key Words: Video Content Understanding; Action Localization; Audio-Visual Learning |
|
Semi-Supervised Video Paragraph Grounding With Contrastive Encoder
Xun Jiang, Xing Xu, Jingran Zhang, Fumin Shen, Zuo Cao, Heng Tao Shen IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 [Paperlink] Key Words: Video Content Understanding, Semi-Supervised Learning; Multimodal Learning |