Visual_Speech_Recognition_for_Multiple

Which methods you use to extract landmarks from image

1

Which methods you use to extract landmarks from image?

Winfredy

Training Code Required

2

Thanks for the releasement. I wonder if the training code will be available in the future? Thanks.

Fengdalu

GRID dataset

Could you please share the original GRID data set? There are some missing items online.

isxuwl02

CMU-MOSEAS Dataset

4

Do you still have a copy of the CMU-MOSEAS dataset? I've been informed by the authors that they lost all copies of it. If you still have a copy I...

spaghettiSystems

Version issues

3

Can you please tell me the life version of pytorch you are using, I have some errors with the 2.0 version. Thank you!

jayden-leo

How does this compare to projects like SBL_For_Multilingual_Lip_Reading where they use phonemes to do multilingual lip reading

TheMakerOfWorlds

How deal with dataset already cropped mouth region

1

Hi dear I would ask how I can deal with data already cropped mouth region with distribution size, I want to apply all pre-processing and data augmentation processes on this...

SHAIIM04

pre-trained VSR / ASR model

As mentioned in S3, the pre-trained models are always trained on the same data as the full model (yet I do not know the pre-training details), and specially the pre-trained...

LindgeW

How are multiple datasets combined during training?

Hi, thanks for this great work. I have a question about the section `"3.8 Using Additional Training Data"` from your paper `"Visual Speech Recognition for Multiple Languages in the Wild"`...

DomhnallBoyle

Is there an audio-visual Chinese model?

Thanks for releasing the awesome work! I noticed that the Chinese lip reading model is based on the visual modality. I used the visual model but it achieved poor performance...

cooelf

Visual_Speech_Recognition_for_Multiple_Languages
Visual_Speech_Recognition_for_Multiple_Languages copied to clipboard

Metadata

Which methods you use to extract landmarks from image

Training Code Required

GRID dataset

CMU-MOSEAS Dataset

Version issues

How does this compare to projects like SBL_For_Multilingual_Lip_Reading where they use phonemes to do multilingual lip reading

How deal with dataset already cropped mouth region

pre-trained VSR / ASR model

How are multiple datasets combined during training?

Is there an audio-visual Chinese model?

← Metadata

Owner

Metadata

Visual_Speech_Recognition_for_Multiple_Languages Visual_Speech_Recognition_for_Multiple_Languages copied to clipboard

Metadata

← Metadata

Owner

Metadata

Visual_Speech_Recognition_for_Multiple_Languages
Visual_Speech_Recognition_for_Multiple_Languages copied to clipboard