av_hubert WER on base_noise_pt_noise_ft

I'm trying to get decoding result from your avsr-fintuend model (avhubert_pretrained/model/lrs3_vox/avsr/base_noise_pt_noise_ft_30h.pt)

(I thought) the configuration is not wrong, but i couldn't get same result on my system.

The c-wer of downloaded avsr-fituned (PT type=Noisy,FT_type=Noisy) shows 4.29% on my own system.

Did i miss something?

inference command : python -B infer_s2s.py --config-dir conf --config-name s2s_decode.yaml dataset.gen_subset=test common_eval.path=./multimodal/avhubert_pretrained/model/lrs3_vox/avsr/base_noise_pt_noise_ft_30h.pt common_eval.results_path=fb_base_noise_pt_noise_ft_30h override.modalities=['video','audio'] common.user_dir=pwd override.data=./multimodal/lrs3/30h_data/ override.label_dir=./multimodal/lrs3/30h_data

s2s_decode.yaml: same as github

Thank you for your consideration. ;)

Feb 24 '22 00:02 minkyu119

Hi,

You need to tune the decoding hyperparameters (mostly generation.beam and generation.lenpen) a bit. For that particular number, setting generation.beam=20 generation.lenpen=1 will lead to WER of 4.1%.

Feb 24 '22 06:02 chevalierNoir

I'm trying to get decoding result from your avsr-fintuend model (avhubert_pretrained/model/lrs3_vox/avsr/base_noise_pt_noise_ft_30h.pt)

(I thought) the configuration is not wrong, but i couldn't get same result on my system.

The c-wer of downloaded avsr-fituned (PT type=Noisy,FT_type=Noisy) shows 4.29% on my own system.

Did i miss something?

inference command : python -B infer_s2s.py --config-dir conf --config-name s2s_decode.yaml dataset.gen_subset=test common_eval.path=./multimodal/avhubert_pretrained/model/lrs3_vox/avsr/base_noise_pt_noise_ft_30h.pt common_eval.results_path=fb_base_noise_pt_noise_ft_30h override.modalities=['video','audio'] common.user_dir=pwd override.data=./multimodal/lrs3/30h_data/ override.label_dir=./multimodal/lrs3/30h_data

s2s_decode.yaml: same as github

Thank you for your consideration. ;)

Hi, may I ask how do you load a official provided finetuned model? I'm also trying to load a finetuned lipreading model. Here is my step:

load the pretrained model, replace cfg.model.w2v_args.task.data, cfg.model.w2v_args.task.label_dir, cfg.task.data, cfg.task.label_dir, cfg.toeknizer_bpe_model to my own path, save the fixed model
load the fixed model and infer.

However, the resulted WER is super high. I guess my "dict.wrd.txt" is inconsistent with the official provided pretrained model?

Feb 24 '22 06:02 jxzhanggg

I'm trying to get decoding result from your avsr-fintuend model (avhubert_pretrained/model/lrs3_vox/avsr/base_noise_pt_noise_ft_30h.pt) (I thought) the configuration is not wrong, but i couldn't get same result on my system. The c-wer of downloaded avsr-fituned (PT type=Noisy,FT_type=Noisy) shows 4.29% on my own system. Did i miss something? inference command : python -B infer_s2s.py --config-dir conf --config-name s2s_decode.yaml dataset.gen_subset=test common_eval.path=./multimodal/avhubert_pretrained/model/lrs3_vox/avsr/base_noise_pt_noise_ft_30h.pt common_eval.results_path=fb_base_noise_pt_noise_ft_30h override.modalities=['video','audio'] common.user_dir=pwd override.data=./multimodal/lrs3/30h_data/ override.label_dir=./multimodal/lrs3/30h_data s2s_decode.yaml: same as github Thank you for your consideration. ;)

Hi, may I ask how do you load a official provided finetuned model? I'm also trying to load a finetuned lipreading model. Here is my step:

load the pretrained model, replace cfg.model.w2v_args.task.data, cfg.model.w2v_args.task.label_dir, cfg.task.data, cfg.task.label_dir, cfg.toeknizer_bpe_model to my own path, save the fixed model

load the fixed model and infer.

However, the resulted WER is super high. I guess my "dict.wrd.txt" is inconsistent with the official provided pretrained model?

There is no need to change the checkpoint in loading a fine-tuned model. Just doing python infer.py --args... override.data=/path/to/test-data/ override.label_dir=/path/to/test-label/ for inference should work.

Feb 24 '22 21:02 chevalierNoir

Thanks for your reply! If I don't do that, it will encounter an error:

I suppose it's your dict path ? So I have to replace them to make the code work.

A saved model should be related to a proper dictionary file. The error above is dictionary used for pretrained, which is not important and I can cheat the code to work. The real important thing is when decoding your pretrained model outputs ID is not corresponding exactly with my dictionary when decoding.

Feb 25 '22 02:02 jxzhanggg

Thanks for your reply! If I don't do that, it will encounter an error:

I suppose it's your dict path ? So I have to replace them to make the code work.

A saved model should be related to a proper dictionary file. The error above is dictionary used for pretrained, which is not important and I can cheat the code to work. The real important thing is when decoding your pretrained model outputs ID is not corresponding exactly with my dictionary when decoding.

Which checkpoint were you using? The dictionary is saved in the checkpoint and will be loaded automatically at test time. You can refer to a decoding example in the demo.

Feb 25 '22 06:02 chevalierNoir

Thanks for your reply! If I don't do that, it will encounter an error: I suppose it's your dict path ? So I have to replace them to make the code work. A saved model should be related to a proper dictionary file. The error above is dictionary used for pretrained, which is not important and I can cheat the code to work. The real important thing is when decoding your pretrained model outputs ID is not corresponding exactly with my dictionary when decoding.

Which checkpoint were you using? The dictionary is saved in the checkpoint and will be loaded automatically at test time. You can refer to a decoding example in the demo.

I found the reason. I used a different version fairseq, that made the task lost the saved target dictionary state. So I comment this line task = tasks.setup_task(saved_cfg.task) in infer_s2s.py, then the code work now.

Feb 25 '22 08:02 jxzhanggg

Hi, @minkyu119 ! I've seen that you're using avsr model. Did you try running it in colab? I'm trying to modify the provided colab notebook, but it crashes when I try it with base_noise_pt_noise_ft_30h.pt

Sep 27 '23 21:09 nastia-lado

WER on base_noise_pt_noise_ft_30h.pt