Automatic_Speech_Recognition icon indicating copy to clipboard operation
Automatic_Speech_Recognition copied to clipboard

Serious bug in libri_train.py

Open michaelklachko opened this issue 5 years ago • 4 comments

LibriSpeech dataset (e.g. train-clean-100) is split into multiple directories during preprocessing. Then during training, the code iterates through these directories: https://github.com/zzw922cn/Automatic_Speech_Recognition/blob/master/speechvalley/main/libri_train.py#L159

The problem is that for each directory, a new model is created according to the maxTimeSteps parameter for the inputs in the directory. This means that if we have 8 directories for train-clean-100 dataset, we are training 8 separate models, which don't share their weights (in fact, every time a model saves a checkpoint, it overwrites the checkpoint saved by the previous model).

This means that we are effectively training only one model out of 8, and we are training it only on data in the last directory (so we are using 1/8 of the dataset).

michaelklachko avatar Jul 08 '18 22:07 michaelklachko

I can put all training .npy files into one directory, but the real problem is that the model would have to fit the largest sample in the whole dataset: if the largest sample is 4000 time steps, then every sample would need to be padded to this size. This would make training extremely slow.

Look at https://github.com/fordDeepDSP/deepSpeech code for a better solution (bucketing sorted inputs).

michaelklachko avatar Jul 09 '18 17:07 michaelklachko

From L171, I think the logic is to restore saved parameters trained on previous folders? So I guess it's not training 8 separate models if the keep option is set to True.

                if keep == True:
                    ckpt = tf.train.get_checkpoint_state(savedir)
                    if ckpt and ckpt.model_checkpoint_path:
                        model.saver.restore(sess, ckpt.model_checkpoint_path)
                        print('Model restored from:' + savedir)

RoyJames avatar Oct 01 '19 16:10 RoyJames

I got this repo to work, however it took a lot of effort and many bug fixes. At the end, it's just not worth it - this repo has pretty much been abandoned, and there are better repos available (fordDSP, Mozilla, or SeanNaren for the excellent PyTorch implementation). Also, DeepSpeech is pretty old - there are now better architectures, for example Jasper or transducer based ones). Don't waste your time on this one.

michaelklachko avatar Oct 01 '19 16:10 michaelklachko

I got this repo to work, however it took a lot of effort and many bug fixes. At the end, it's just not worth it - this repo has pretty much been abandoned, and there are better repos available (fordDSP, Mozilla, or SeanNaren for the excellent PyTorch implementation). Also, DeepSpeech is pretty old - there are now better architectures, for example Jasper or transducer based ones). Don't waste your time on this one.

I kinda agree after trying this repo on LibriSpeech. And thank you for the pointers. I also checked fordDSP and SeanNaren's DeepSpeech2 pytorch implementation, but I still see people having trouble getting reasonable WER/CER there without getting responses. I just want to train on LibriSpeech and might have to try Kaldi now.

RoyJames avatar Oct 01 '19 18:10 RoyJames