openspeech icon indicating copy to clipboard operation
openspeech copied to clipboard

Am I training properly?

Open resurgo97 opened this issue 3 years ago • 21 comments

I am not familiar with ASR tasks, so I'd be so glad if anyone answer my question:

I am training ContextNet which is basically rnn-t type model as in the original paper. Because I need only encoder part of the model, I am using 'contextnet' instead of 'contextnet_transducer', which helps train faster & reduce memory usage. Since I only have encoder, I also use as criterion 'ctc' instead of 'rnnt'.

I am not sure if this configuration is valid for proper training.

image

Now at epoch 8, valid_wer=0.996 and valid_cer=7.140, but I doubt they are in expected range though.


My training script is as below:

python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_path=/home/ubuntu/TEST/libri dataset.dataset_download=False dataset.manifest_file_path=/home/ubuntu/TEST/libri/LibriSpeech/libri_subword_manifest.txt
tokenizer=libri_subword model=contextnet audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=ctc tokenizer.vocab_path=/home/ubuntu/TEST/libri/LibriSpeech/ trainer.sampler=random lr_scheduler.peak_lr=0.0025 audio.frame_length=25.0 trainer.batch_size=128

resurgo97 avatar Dec 04 '21 02:12 resurgo97

I think the training is not done correctly, can I see the loss graph?

upskyy avatar Dec 05 '21 16:12 upskyy

Hello I guess I have the same issue with

  model_name: contextnet_lstm
  model_size: medium
  input_dim: 80
  num_encoder_layers: 5
  num_decoder_layers: 2
  kernel_size: 5
  num_channels: 256
  encoder_dim: 640
  num_attention_heads: 8
  attention_dropout_p: 0.1
  decoder_dropout_p: 0.1
  max_length: 128
  teacher_forcing_ratio: 1.0
  rnn_type: lstm
  decoder_attn_mechanism: loc
  optimizer: adam

WER does not improve after a few epochs.

Iuliia

yunigma avatar Feb 01 '22 15:02 yunigma

@resurgo97 hello. Did you manage to fix this issue finally? Thanks. Iuliia

yunigma avatar Feb 02 '22 08:02 yunigma

Can you show the log? cc. @upskyy

sooftware avatar Feb 02 '22 08:02 sooftware

Hi @sooftware . Attach here. Thank you! logs_20220201_2.log

yunigma avatar Feb 02 '22 08:02 yunigma

Obviously something weird. @upskyy look at this. Loss too large.
I think we need to check if lr is being adjusted.

sooftware avatar Feb 02 '22 09:02 sooftware

@yunigma Can you attach the command that you used?

sooftware avatar Feb 02 '22 09:02 sooftware

Thank you!! python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path="../../../../database/LibriSpeech/" dataset.manifest_file_path="../../../openspeech/datasets/librispeech/libri_subword_manifest.txt" tokenizer=libri_subword model=contextnet_lstm audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=cross_entropy

yunigma avatar Feb 02 '22 09:02 yunigma

I'll test it. I need to download new data, so please wait a little bit.

sooftware avatar Feb 02 '22 09:02 sooftware

I'm really sorry for the late reply.
I trained the contextnet model with ctc, and it was confirmed that the training worked well.

스크린샷 2022-02-27 오후 11 33 00 스크린샷 2022-02-27 오후 11 33 17
python ./openspeech_cli/hydra_train.py \
    dataset=ksponspeech \
    tokenizer=kspon_character \
    model=contextnet \
    audio=fbank \
    lr_scheduler=warmup_reduce_lr_on_plateau \
    trainer=gpu \
    criterion=ctc

upskyy avatar Feb 27 '22 14:02 upskyy

@upskyy Thank you! Grrreat!

sooftware avatar Feb 28 '22 09:02 sooftware

@upskyy thank you very much for testing! Do you think that the training can improve slower with the librispeech dataset? Or there is some error in the training itself?

yunigma avatar Feb 28 '22 11:02 yunigma

@yunigma I think it's a subword related issue rather than librispeech dataset. I've confirmed that I'm learning with kspon_character, so How about trying it out with libri_character?

upskyy avatar Feb 28 '22 13:02 upskyy

Hi @upskyy ! I have finally tried to reproduce the same setup but with librispeech.

python ./openspeech_cli/hydra_train.py dataset=librispeech \
	dataset.dataset_download=False \
	dataset.dataset_path="../database/LibriSpeech/" \
        dataset.manifest_file_path="../../../openspeech/datasets/librispeech/libri_char_manifest.txt" \
	tokenizer=libri_character \
	model=contextnet \
	audio=fbank \
	lr_scheduler=warmup_reduce_lr_on_plateau \
	trainer=gpu \
	criterion=ctc

After two days of training (100 epochs) I got these results: Screenshot 2022-04-04 at 17 20 10

I do not why WER went up at some point... Also I see that the global step in my case is different from yours, it is very small.

yunigma avatar Apr 05 '22 07:04 yunigma

@upskyy I think there is an error in the WER calculation process.
But if I look at the graph uploaded by @upskyy, I don't think so, but what makes this difference?

sooftware avatar Apr 05 '22 07:04 sooftware

Probably, otherwise CER would also go down... But CER was not improving much either. Do you know how long @upskyy was training and with which parameters? Here is my logs: log_ctc_char.txt

yunigma avatar Apr 05 '22 07:04 yunigma

@yunigma I think I trained for about 36 hours. Detailed parameters are written in the log. I wonder what might have made the difference. 😢 I'll test it out when I have time.

hydra_train.log

upskyy avatar Apr 07 '22 05:04 upskyy

Hi @all, I got issues when training model with openspeech. I did not see sp.model file in Librispeech folder. Could you help me?

Luong-Github avatar Nov 27 '22 18:11 Luong-Github

You may download the related librispeech files below. (from README.md)

|LibriSpeech|character|[Link]|[Link]|-| |LibriSpeech|subword|[Link]|[Link]|[Link]|

ccyang123 avatar Jan 02 '23 08:01 ccyang123

@yunigma @upskyy Did you solve this problem? I met the same problem when training the squeezeformer netowrk with LibriSpeech. The CER is going down however, WER is not going down. (I used "libri_character" as tokenizer and the "libri_char_manifest.txt" as manifest_file_path. )

When I used the "libri_subword" as tokenizer and the "libri_subword_manifest.txt" as manifest_file_path. Both the CER and WER are going down during the training. However, the CER can WER can not be very low. :(

Thank you!

ccyang123 avatar Jan 02 '23 11:01 ccyang123

Both CER and WER are going down after 3 epochs training. image

ccyang123 avatar Jan 03 '23 06:01 ccyang123