espnet icon indicating copy to clipboard operation
espnet copied to clipboard

[WIP] S2ST recipe for SpeechMatrix

Open juice500ml opened this issue 10 months ago • 0 comments

What?

Features

  • [x] Enable utilizing pretrained kmeans model
  • [x] Enable utilizing HiFiGAN pretrained vocoder
  • [ ] Add shallow decoder for source unit estimation
  • [ ] Add huggingface ASR models for ASR BLEU calculation
  • [ ] Refactor: Add option for choosing between maintaining vocabulary vs. filtering OOV cases
  • [ ] Refactor: Connect run.sh's src_lang and tgt_lang parameter with local/data.sh
  • [ ] Refactor: We have to manually skip Stage 4. Add --skip_stages or other flags for s2st.sh
  • [ ] Refactor: fairseq version that is currently installed by ESPnet does not support HiFiGAN
  • [ ] Refactor: Handle different vocoder types more elegantly

Bug fixes

  • [x] Fix bugs related to the case where we only have speech data for training (i.e., use_src_lang=false, use_tgt_lang=false)
  • [x] Fixed fix_data_dir filtering bug (due to wav.scp.${src_lang} and wav.scp.${tgt_lang} being custom files)

Data preparation

  • [x] FLEURS data preparation
  • [x] EPST data preparation
  • [x] SpeechMatrix data preparation
  • [x] Remove fairseq dependency on data preparation by copying & modifying
  • [x] Add necessary exception handling for additional python packages
  • [ ] SpeechMatrix valid/test splits
  • [ ] Refactor data_prep.py (too big of a file with too many repetitions)
  • [ ] Refactor run.sh to handle different test_sets per language pairs
  • [ ] Refactor this commit: 3af3f41

Modeling

  • [x] Conducted hyperparameter tuning to find the optimal architecture
  • [x] Conducted hyperparameter tuning to find the optimal learning rate

Why?

See also

juice500ml avatar Apr 05 '24 21:04 juice500ml