openspeech
openspeech copied to clipboard
Am I training properly?
I am not familiar with ASR tasks, so I'd be so glad if anyone answer my question:
I am training ContextNet which is basically rnn-t type model as in the original paper. Because I need only encoder part of the model, I am using 'contextnet' instead of 'contextnet_transducer', which helps train faster & reduce memory usage. Since I only have encoder, I also use as criterion 'ctc' instead of 'rnnt'.
I am not sure if this configuration is valid for proper training.
Now at epoch 8, valid_wer=0.996 and valid_cer=7.140, but I doubt they are in expected range though.
My training script is as below:
python ./openspeech_cli/hydra_train.py
dataset=librispeech dataset.dataset_path=/home/ubuntu/TEST/libri
dataset.dataset_download=False dataset.manifest_file_path=/home/ubuntu/TEST/libri/LibriSpeech/libri_subword_manifest.txt
tokenizer=libri_subword
model=contextnet
audio=fbank
lr_scheduler=warmup_reduce_lr_on_plateau
trainer=gpu
criterion=ctc
tokenizer.vocab_path=/home/ubuntu/TEST/libri/LibriSpeech/
trainer.sampler=random
lr_scheduler.peak_lr=0.0025
audio.frame_length=25.0
trainer.batch_size=128
I think the training is not done correctly, can I see the loss graph?
Hello I guess I have the same issue with
model_name: contextnet_lstm
model_size: medium
input_dim: 80
num_encoder_layers: 5
num_decoder_layers: 2
kernel_size: 5
num_channels: 256
encoder_dim: 640
num_attention_heads: 8
attention_dropout_p: 0.1
decoder_dropout_p: 0.1
max_length: 128
teacher_forcing_ratio: 1.0
rnn_type: lstm
decoder_attn_mechanism: loc
optimizer: adam
WER does not improve after a few epochs.
Iuliia
@resurgo97 hello. Did you manage to fix this issue finally? Thanks. Iuliia
Can you show the log? cc. @upskyy
Hi @sooftware . Attach here. Thank you! logs_20220201_2.log
Obviously something weird. @upskyy look at this. Loss too large.
I think we need to check if lr is being adjusted.
@yunigma Can you attach the command that you used?
Thank you!!
python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path="../../../../database/LibriSpeech/" dataset.manifest_file_path="../../../openspeech/datasets/librispeech/libri_subword_manifest.txt" tokenizer=libri_subword model=contextnet_lstm audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=cross_entropy
I'll test it. I need to download new data, so please wait a little bit.
I'm really sorry for the late reply.
I trained the contextnet model with ctc, and it was confirmed that the training worked well.


python ./openspeech_cli/hydra_train.py \
dataset=ksponspeech \
tokenizer=kspon_character \
model=contextnet \
audio=fbank \
lr_scheduler=warmup_reduce_lr_on_plateau \
trainer=gpu \
criterion=ctc
@upskyy Thank you! Grrreat!
@upskyy thank you very much for testing! Do you think that the training can improve slower with the librispeech dataset? Or there is some error in the training itself?
@yunigma I think it's a subword related issue rather than librispeech dataset. I've confirmed that I'm learning with kspon_character, so How about trying it out with libri_character?
Hi @upskyy ! I have finally tried to reproduce the same setup but with librispeech.
python ./openspeech_cli/hydra_train.py dataset=librispeech \
dataset.dataset_download=False \
dataset.dataset_path="../database/LibriSpeech/" \
dataset.manifest_file_path="../../../openspeech/datasets/librispeech/libri_char_manifest.txt" \
tokenizer=libri_character \
model=contextnet \
audio=fbank \
lr_scheduler=warmup_reduce_lr_on_plateau \
trainer=gpu \
criterion=ctc
After two days of training (100 epochs) I got these results:
I do not why WER went up at some point... Also I see that the global step in my case is different from yours, it is very small.
@upskyy I think there is an error in the WER calculation process.
But if I look at the graph uploaded by @upskyy, I don't think so, but what makes this difference?
Probably, otherwise CER would also go down... But CER was not improving much either. Do you know how long @upskyy was training and with which parameters? Here is my logs: log_ctc_char.txt
@yunigma I think I trained for about 36 hours. Detailed parameters are written in the log. I wonder what might have made the difference. 😢 I'll test it out when I have time.
Hi @all, I got issues when training model with openspeech. I did not see sp.model file in Librispeech folder. Could you help me?
You may download the related librispeech files below. (from README.md)
|LibriSpeech|character|[Link]|[Link]|-| |LibriSpeech|subword|[Link]|[Link]|[Link]|
@yunigma @upskyy Did you solve this problem? I met the same problem when training the squeezeformer netowrk with LibriSpeech. The CER is going down however, WER is not going down. (I used "libri_character" as tokenizer and the "libri_char_manifest.txt" as manifest_file_path. )
When I used the "libri_subword" as tokenizer and the "libri_subword_manifest.txt" as manifest_file_path. Both the CER and WER are going down during the training. However, the CER can WER can not be very low. :(
Thank you!
Both CER and WER are going down after 3 epochs training.