AVSU-VIPL icon indicating copy to clipboard operation
AVSU-VIPL copied to clipboard

Collection of works from VIPL-AVSU

Collection of works from VIPL-AVSU

This is a collection of works from the Audio-Visual Speech Understanding Group at VIPL.

Our group website is here.

Recent News:

[2024-02]: One paper is accepted by CVPR 2024! Congratulations to Yuan-hang!

[2023-08]: Three papers are accepted by BMVC 2023! Congratulations to Bing-quan, Song-tao and Fei-xiang!

[2022-06]: Championship again of the AVA Active Speaker Challenge @ CVPR 2022! More details can be found here. Congratulations to Yuanhang and Susan!

[2022-03]: One paper is accepted by ICPR 2022! Congratulations to Dalu!

[2021-07]: One paper is accepted by ICME Workshop 2021! Congratulations to Dalu!

[2021-07]: One paper is accepted by ACM MM 2021! Congratulations to Yuanhang and Susan!

[2021-06]: Champion of the AVA Active Speaker Challenge @ CVPR 2021! More details can be found here. Congratulations to Yuanhang and Susan!

Datasets

LRW-1000: A naturally-distributed large-scale benchmark for lip reading in the wild, FG 2019

The largest Mandarin word-level audio-visual speech recognition dataset (2022), also called CAS-VSR-W1k.

Note: If you cannot open the website for the dataset, you can go to the paper page for details about the data, and then download the agreement file here in this repository if you plan to use this dataset for your research. Please read the agreement carefully, and complete it appropriately. Note that the agreement should be signed by a full-time staff member (that is, students are not acceptable). Then, please scan the signed agreement and send it to [email protected]. When we receive your reply, we will provide the download link to you as soon as possible.

Challenges

2022 世界机器人大赛-共融机器人挑战赛-语音识别技术赛

  • Homepage: here
  • Date: 2022/06-2022/12
  • 欢迎报名!

2019-The 1st Mandarin Audio-Visual Speech Recognition Challenge (MAVSR)

This challenge aims at exploring the complementarity between visual and acoustic information in real-world speech recognition systems.

Publications

  • Yuanhang Zhang, Susan Liang, Shuang Yang, Shiguang Shan, "Unicon+: ICTCAS-UCAS-TAL Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022", The ActivityNet Large-Scale Activity Recognition Challenge at CVPR 2022 (1st Place).

  • Dalu Feng, Shuang Yang, Shiguang Shan, Xilin Chen, "Audio-Driven Deformation Flow for Effective Lip Reading", ICPR 2022

  • Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, "ICTCAS-UCAS-TAL Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2021", The ActivityNet Large-Scale Activity Recognition Challenge at CVPR 2021 (1st Place). [PDF]

  • Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen, "UniCon: Unified Context Network for Robust Active Speaker Detection", ACM MM 2021.(Oral). [Website] | [PDF]

  • Dalu Feng, Shuang Yang, Shiguang Shan, Xilin Chen, "Learn an Effective Lip Reading Model without Pains", ICME Workshop 2021
    [PDF] | [code]

  • Mingshuang Luo, Shuang Yang, Shiguang Shan, Xilin Chen, "Synchronous Bidirectional Learning for Multilingual Lip Reading", BMVC 2020
    [PDF] | [code]

  • Jingyun Xiao, Shuang Yang, Yuanhang Zhang, Shiguang Shan, Xilin Chen, "Deformation Flow Based Two-Stream Network for Lip Reading", FG 2020
    [PDF] | [code]

  • Xing Zhao, Shuang Yang, Shiguang Shan, Xilin Chen, "Mutual Information Maximization for Effective Lipreading", FG 2020
    [PDF] | [code]

  • Yuanhang Zhang, Shuang Yang, Jingyun Xiao, Shiguang Shan, Xilin Chen, "Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition", FG 2020 (oral)
    [PDF] | [code]

  • Mingshuang Luo, Shuang Yang, Shiguang Shan, Xilin Chen, "Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading", FG 2020
    [PDF]

  • Yuanhang Zhang, Jingyun Xiao, Shuang Yang, Shiguang Shan, "Multi-Task Learning for Audio-Visual Active Speaker Detection", CVPR ActivityNet Challenge 2019
    [PDF]

  • Yang Shuang, Yuanhang Zhang, Dalu Feng, Mingmin Yang, Chenhao Wang, Jingyun Xiao, Keyu Long, Shiguang Shan, and Xilin Chen. "LRW-1000: A naturally-distributed large-scale benchmark for lip reading in the wild." FG 2019 [PDF] | [Dataset] | Code@fengdalu Code@NirHeaven