Visual_Speech_Recognition_for_Multiple_Languages icon indicating copy to clipboard operation
Visual_Speech_Recognition_for_Multiple_Languages copied to clipboard

pre-trained VSR / ASR model

Open LindgeW opened this issue 1 year ago • 0 comments

As mentioned in S3, the pre-trained models are always trained on the same data as the full model (yet I do not know the pre-training details), and specially the pre-trained VSR model has exactly the same architecture as the full one. So, I wonder why the supervised signals (e.g., intermediate representations) from pre-trained VSR still make sense. Could you give in-depth explanations?

LindgeW avatar Sep 12 '23 03:09 LindgeW