speech-transformer-pytorch_lightning
speech-transformer-pytorch_lightning copied to clipboard
ASR project with pytorch-lightning
End2End chinese-english code-switch speech recognition in pytorch
This is a mixed project borrowing from many awesome projects opened recently.
With pytorch-lightning, experiments can be carried out easily. And i will try to make evey calculation in a batched and cleaned way. (such as add bos & eos into batched target and spec augment) Any ideas can be put into the issues, and welcome for discussion. (This project is still being building and reorganizing)
project features:
joint attention & ctc beam search decode with rnn lm
multi dataset
using pytorch lightning for 16bit training
Chinese-char level & English-word level tokenizer
sentence piece tokenizer for english tokenizing
rnn_lm training
label smoothing
customized transformer encoder and decoder see: src/model/modules/transformer_encoder...
*rezero transformer for some converge problem with half precision and speed consideration
feature:
log fbank with sub sample
speed augment
a spec augment using gpu as a layer in model
customized feature filtering , see src/loader/utils/build_fbank remove_empty_line_2d
optimizer:
Ranger
model:
rezero transformer
restricted encoder field
better mask (may be a little slower than other project but effective)
loss:
lambda * ce loss + (1-lambda * ctc loss) + code switch loss
requirement:
see docker/
references:
https://github.com/ZhengkunTian/OpenTransformer
https://github.com/espnet/espnet
https://github.com/jadore801120/attention-is-all-you-need-pytorch
https://github.com/alphadl/lookahead.pytorch
https://github.com/LiyuanLucasLiu/RAdam
https://github.com/vahidk/tfrecord
https://github.com/kaituoxu/Speech-Transformer
https://github.com/majumderb/rezero
https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
data: aishell1 170h aishell2 1000h magic data 750h prime 100h not used stcmd 100h not used datatang 200h datatang 500h datatang mix 200h librispeech 960h
train step
english -> eng(sub) + mix + chinese -> chinese + mix -> mix