fairseq Cannot reproduce the result of ASR task on CoVost 2 dataset (Indonesian)

Cannot reproduce the result of ASR task on CoVost 2 dataset (Indonesian)

Open florallam opened this issue 3 years ago • 0 comments

❓ Questions and Help

What is your question?

Hello! I have been trying to train a fairseq ASR for Indonesian and have not reached a breakthrough. Currently, while training the ASR, it has consistently been giving poor WER scores of >100 (the expected WER is around 20-30). The inference output has been rather perplexing, as there is a repetition of certain terms/ phrases even though they do not resemble the original term remotely. Really hope that someone would be able to provide some guidance on this! Any help is appreciated, thank you!

Code

fairseq-train data/indo/cv-corpus-9.0-2022-04-27/id ^ --config-yaml config_asr_id.yaml --train-subset train_asr_id --valid-subset dev_asr_id ^ --save-dir data/indo/asr --num-workers 4 --max-tokens 50000 --max-update 60000 ^ --task speech_to_text --criterion label_smoothed_cross_entropy --label-smoothing 0.1 ^ --report-accuracy --arch s2t_transformer_s --dropout 0.15 --optimizer adam --lr 2e-3 ^ --lr-scheduler inverse_sqrt --warmup-updates 10000 --clip-norm 10.0 --seed 1 --update-freq 8

fairseq-generate data/indo/cv-corpus-9.0-2022-04-27/id ^ --config-yaml config_asr_id.yaml --gen-subset test_asr_id --task speech_to_text ^ --path data/indo/asr/avg_last_10_checkpoint.pt --max-tokens 50000 --beam 5 ^ --scoring wer --wer-tokenizer 13a --wer-lowercase --wer-remove-punct ^ --results-path data/indo/cv-corpus-9.0-2022-04-27/id/results

What have you tried?

I have tried tuning the dropout values and learning rates, but the improvement in terms of WER has been marginal.

What's your environment?

fairseq Version: 1.0
PyTorch Version (e.g., 1.0):
OS: Windows
How you installed fairseq (pip, source): git pull fairseq repository
Build command you used (if compiling from source): pip
Python version: 3.8.13
CUDA/cuDNN version:
GPU models and configuration: Nvidia GeForce RTX 3080
Any other relevant information:

Jul 22 '22 07:07 florallam

fairseq fairseq copied to clipboard

Cannot reproduce the result of ASR task on CoVost 2 dataset (Indonesian)

❓ Questions and Help

What is your question?

Code

What have you tried?

What's your environment?

fairseq
fairseq copied to clipboard