espnet [WIP] S2ST recipe for SpeechMatrix

[WIP] S2ST recipe for SpeechMatrix

Open juice500ml opened this issue 10 months ago • 0 comments

Features

[x] Enable utilizing pretrained kmeans model
[x] Enable utilizing HiFiGAN pretrained vocoder
[ ] Add shallow decoder for source unit estimation
[ ] Add huggingface ASR models for ASR BLEU calculation
[ ] Refactor: Add option for choosing between maintaining vocabulary vs. filtering OOV cases
[ ] Refactor: Connect run.sh's src_lang and tgt_lang parameter with local/data.sh
[ ] Refactor: We have to manually skip Stage 4. Add --skip_stages or other flags for s2st.sh
[ ] Refactor: fairseq version that is currently installed by ESPnet does not support HiFiGAN
[ ] Refactor: Handle different vocoder types more elegantly

Bug fixes

[x] Fix bugs related to the case where we only have speech data for training (i.e., use_src_lang=false, use_tgt_lang=false)
[x] Fixed fix_data_dir filtering bug (due to wav.scp.${src_lang} and wav.scp.${tgt_lang} being custom files)

Data preparation

Modeling