simulst
simulst copied to clipboard
PyTorch toolkit for streaming speech recognition, speech translation and simultaneous translation based on fairseq.
Simultaneous Speech Translation
Code base for simultaneous speech translation experiments. It is based on fairseq.
Implemented
Encoder
Streaming Models
- Wait-k [example]
- Monotonic Multihead Attention [example]
- Continuous Integrate-and-Fire [example]
Setup
- Install fairseq
git clone https://github.com/pytorch/fairseq.git
cd fairseq
git checkout 4a7835b
python setup.py build_ext --inplace
pip install .
- (Optional) Install apex for faster mixed precision (fp16) training.
- Install dependencies
pip install -r requirements.txt
- Update submodules
git submodule update --init --recursive
Pre-trained model
ASR model with Emformer encoder and Transformer decoder. Pre-trained with joint CTC cross-entropy loss.
MuST-C (WER) | en-de (V2) | en-es |
---|---|---|
dev | 9.65 | 14.44 |
tst-COMMON | 12.85 | 14.02 |
model | download | download |
vocab | download | download |
Sequence-level Knowledge Distillation
MuST-C (BLEU) | en-de (V2) |
---|---|
valid | 31.76 |
distillation | download |
vocab | download |
Citation
Please consider citing our paper:
@inproceedings{chang22f_interspeech,
author={Chih-Chiang Chang and Hung-yi Lee},
title={{Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation}},
year=2022,
booktitle={Proc. Interspeech 2022},
pages={5175--5179},
doi={10.21437/Interspeech.2022-10627}
}