Enxin Song is a master student at Zhejiang University, with CVNext Lab advised by Prof. Gaoang Wang. She is fortunate to have internship at Media Computing Group, Microsoft Research Asia, advised by Dr. Xun Guo.Her research interests include domain adaptation, video understanding, and multi-modality learning.


Selected Publications:

Also see Google Scholar.

  • MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
    Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Xun Guo, Tian Ye, Yan Lu, Jenq-Neng Hwang, Gaoang Wang✉

    [Paper] [Code] NPM
    A novel framework that integrating vision models and LLMs to conduct long video understanding tasks.



Updates:

  • Apr. 2024: We are hosting CVPR 2024 Long-form Video Understanding Challenge @ LOVEU

  • Feb. 2024: Our paper MovieChat: From Dense Token to Sparse Memory in Long Video Understanding is accepted by Computer Vision and Pattern Recognition (CVPR), 2024.

  • Nov. 2023: Become a research intern at Microsoft Research Asia (MSRA), advised by principal researcher Xun Guo.

  • Sept 2023: Our paper Devil in the Number: Towards Robust Multi-modality Data Filter is accepted by ICCV 2023 workshop: TNGCV-DataComp.

  • Jul.2023: Our project MovieChat: From Dense Token to Sparse Memory in Long Video Understanding is released at website.

  • Jun.2023: I graduate from Dalian University of Technology.

  • Oct.2022: Start my research on domain adaptation task for image caption, advised by Professor Gaoang Wang.