openspeech Am I training properly?

I am not familiar with ASR tasks, so I'd be so glad if anyone answer my question:

I am training ContextNet which is basically rnn-t type model as in the original paper. Because I need only encoder part of the model, I am using 'contextnet' instead of 'contextnet_transducer', which helps train faster & reduce memory usage. Since I only have encoder, I also use as criterion 'ctc' instead of 'rnnt'.

I am not sure if this configuration is valid for proper training.

Now at epoch 8, valid_wer=0.996 and valid_cer=7.140, but I doubt they are in expected range though.

My training script is as below:

python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_path=/home/ubuntu/TEST/libri dataset.dataset_download=False dataset.manifest_file_path=/home/ubuntu/TEST/libri/LibriSpeech/libri_subword_manifest.txt
tokenizer=libri_subword model=contextnet audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=ctc tokenizer.vocab_path=/home/ubuntu/TEST/libri/LibriSpeech/ trainer.sampler=random lr_scheduler.peak_lr=0.0025 audio.frame_length=25.0 trainer.batch_size=128

Dec 04 '21 02:12 resurgo97

I think the training is not done correctly, can I see the loss graph?

Dec 05 '21 16:12 upskyy

Hello I guess I have the same issue with

  model_name: contextnet_lstm
  model_size: medium
  input_dim: 80
  num_encoder_layers: 5
  num_decoder_layers: 2
  kernel_size: 5
  num_channels: 256
  encoder_dim: 640
  num_attention_heads: 8
  attention_dropout_p: 0.1
  decoder_dropout_p: 0.1
  max_length: 128
  teacher_forcing_ratio: 1.0
  rnn_type: lstm
  decoder_attn_mechanism: loc
  optimizer: adam

WER does not improve after a few epochs.

Iuliia

Feb 01 '22 15:02 yunigma

@resurgo97 hello. Did you manage to fix this issue finally? Thanks. Iuliia

Feb 02 '22 08:02 yunigma

Can you show the log? cc. @upskyy

Feb 02 '22 08:02 sooftware

Hi @sooftware . Attach here. Thank you! logs_20220201_2.log

Feb 02 '22 08:02 yunigma

Obviously something weird. @upskyy look at this. Loss too large.
I think we need to check if lr is being adjusted.

Feb 02 '22 09:02 sooftware

@yunigma Can you attach the command that you used?

Feb 02 '22 09:02 sooftware

Thank you!! python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path="../../../../database/LibriSpeech/" dataset.manifest_file_path="../../../openspeech/datasets/librispeech/libri_subword_manifest.txt" tokenizer=libri_subword model=contextnet_lstm audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=cross_entropy

Feb 02 '22 09:02 yunigma

I'll test it. I need to download new data, so please wait a little bit.

Feb 02 '22 09:02 sooftware

I'm really sorry for the late reply.
I trained the contextnet model with ctc, and it was confirmed that the training worked well.

python ./openspeech_cli/hydra_train.py \
    dataset=ksponspeech \
    tokenizer=kspon_character \
    model=contextnet \
    audio=fbank \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu \
    criterion=ctc

Feb 27 '22 14:02 upskyy

@upskyy Thank you! Grrreat!

Feb 28 '22 09:02 sooftware

@upskyy thank you very much for testing! Do you think that the training can improve slower with the librispeech dataset? Or there is some error in the training itself?

Feb 28 '22 11:02 yunigma

@yunigma I think it's a subword related issue rather than librispeech dataset. I've confirmed that I'm learning with kspon_character, so How about trying it out with libri_character?

Feb 28 '22 13:02 upskyy

Hi @upskyy ! I have finally tried to reproduce the same setup but with librispeech.

python ./openspeech_cli/hydra_train.py dataset=librispeech \
	dataset.dataset_download=False \
	dataset.dataset_path="../database/LibriSpeech/" \
        dataset.manifest_file_path="../../../openspeech/datasets/librispeech/libri_char_manifest.txt" \
	tokenizer=libri_character \
	model=contextnet \
	audio=fbank \
	lr_scheduler=warmup_reduce_lr_on_plateau \
	trainer=gpu \
	criterion=ctc

After two days of training (100 epochs) I got these results: Screenshot 2022-04-04 at 17 20 10

I do not why WER went up at some point... Also I see that the global step in my case is different from yours, it is very small.

Apr 05 '22 07:04 yunigma

@upskyy I think there is an error in the WER calculation process.
But if I look at the graph uploaded by @upskyy, I don't think so, but what makes this difference?

Apr 05 '22 07:04 sooftware

Probably, otherwise CER would also go down... But CER was not improving much either. Do you know how long @upskyy was training and with which parameters? Here is my logs: log_ctc_char.txt

Apr 05 '22 07:04 yunigma

@yunigma I think I trained for about 36 hours. Detailed parameters are written in the log. I wonder what might have made the difference. 😢 I'll test it out when I have time.

hydra_train.log

Apr 07 '22 05:04 upskyy

Hi @all, I got issues when training model with openspeech. I did not see sp.model file in Librispeech folder. Could you help me?

Nov 27 '22 18:11 Luong-Github

You may download the related librispeech files below. (from README.md)

Jan 02 '23 08:01 ccyang123

@yunigma @upskyy Did you solve this problem? I met the same problem when training the squeezeformer netowrk with LibriSpeech. The CER is going down however, WER is not going down. (I used "libri_character" as tokenizer and the "libri_char_manifest.txt" as manifest_file_path. )

When I used the "libri_subword" as tokenizer and the "libri_subword_manifest.txt" as manifest_file_path. Both the CER and WER are going down during the training. However, the CER can WER can not be very low. :(

Thank you!

Jan 02 '23 11:01 ccyang123

Both CER and WER are going down after 3 epochs training.

Jan 03 '23 06:01 ccyang123

openspeech openspeech copied to clipboard

Am I training properly?

openspeech
openspeech copied to clipboard