espnet
espnet copied to clipboard
[WIP] S2ST recipe for SpeechMatrix
What?
Features
- [x] Enable utilizing pretrained kmeans model
- [x] Enable utilizing HiFiGAN pretrained vocoder
- [ ] Add shallow decoder for source unit estimation
- [ ] Add huggingface ASR models for ASR BLEU calculation
- [ ] Refactor: Add option for choosing between maintaining vocabulary vs. filtering OOV cases
- [ ] Refactor: Connect
run.sh
'ssrc_lang
andtgt_lang
parameter withlocal/data.sh
- [ ] Refactor: We have to manually skip Stage 4. Add
--skip_stages
or other flags fors2st.sh
- [ ] Refactor:
fairseq
version that is currently installed by ESPnet does not support HiFiGAN - [ ] Refactor: Handle different vocoder types more elegantly
Bug fixes
- [x] Fix bugs related to the case where we only have speech data for training (i.e.,
use_src_lang=false, use_tgt_lang=false
) - [x] Fixed fix_data_dir filtering bug (due to wav.scp.${src_lang} and wav.scp.${tgt_lang} being custom files)
Data preparation
- [x] FLEURS data preparation
- [x] EPST data preparation
- [x] SpeechMatrix data preparation
- [x] Remove fairseq dependency on data preparation by copying & modifying
- [x] Add necessary exception handling for additional python packages
- [ ] SpeechMatrix valid/test splits
- [ ] Refactor
data_prep.py
(too big of a file with too many repetitions) - [ ] Refactor
run.sh
to handle differenttest_sets
per language pairs - [ ] Refactor this commit: 3af3f41
Modeling
- [x] Conducted hyperparameter tuning to find the optimal architecture
- [x] Conducted hyperparameter tuning to find the optimal learning rate