wav2letter
wav2letter copied to clipboard
Issue on model fine tuning:sota/2019/librispeech
Hi, I'm using the fork command on am_resnet_ctc_librispeech_dev_other.bin to adapt the model to my own dataset, and i got the following errors which says Loss has NaN values.
I0723 11:44:53.263063 2870 W2lListFilesDataset.cpp:147] 2703 files found.
I0723 11:44:53.263103 2870 Utils.cpp:102] Filtered 37/2703 samples
I0723 11:44:53.263499 2870 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2666
I0723 11:44:53.267585 2870 Train.cpp:564] Shuffling trainset ::: 28515
I0723 11:44:53.271889 2870 Train.cpp:571] Epoch 1 started!
F0723 11:44:54.493821 2870 Train.cpp:616] Loss has NaN values. Samples - train-clean-100-4014-186175-0018
*** Check failure stack trace: ***
@ 0x7efd5faca0cd google::LogMessage::Fail()
@ 0x7efd5facbf33 google::LogMessage::SendToLog()
@ 0x7efd5fac9c28 google::LogMessage::Flush()
@ 0x7efd5facc999 google::LogMessageFatal::~LogMessageFatal()
@ 0x7efd6a6d3ce7 _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEES_INS0_19FirstOrderOptimizerEES9_ddblE3_clES2_S5_S7_S9_S9_ddbl
@ 0x7efd6a668ca8 main
@ 0x7efd5edafb97 __libc_start_main
@ 0x7efd6a6cd10a _start
Aborted (core dumped)
root@fc6464776c28:~/wav2letter#
i tried to debug the source code, the audio samples and list file were read into the trainset
successfully
could you help me find out the problem?
and here is my train.cfg:
root@fc6464776c28:~/wav2letter# cat train-office.cfg
# Training config for Mini Librispeech
# Replace `[...]` with appropriate paths
--datadir=/root/wav2letter/
--rundir=/root/wav2letter/training/
--archdir=/root/wav2letter/pre_model/
--train=lists/train-clean-100.lst
--valid=lists/dev-clean.lst
--input=wav
--arch=am_resnet_ctc.arch
--tokensdir=/root/wav2letter/pre_model
--tokens=librispeech-train-all-unigram-10000.tokens
--lexicon=/root/wav2letter/pre_model/librispeech-train+dev-unigram-10000-nbest10.lexicon
--criterion=ctc
--wordseparator=_
--usewordpiece=true
--sampletarget=0.1
--lr=0.4
--linseg=0
--maxgradnorm=1.0
--replabel=1
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--lrcosine
--nthread=4
--batchsize=1
--runname=talk51_trainlogs
--iter=500
--mintsz=2
--minisz=2
i've tried setting --iter
to 10000000 or set other params as train_am_transformer_ctc.cfg
in sota/2019/librispeech, but i still got the same error
Are you running this on librispeech data (because in config your train data are specified as train-clean-100.lst)? Could you show the running command itself and the full log after you run the command (seems you are training from scratch, not finetuning the model)?
Am I able to use audio samples in wav format for CTC criterion? In example, it was shown that flac is used for CTC, so my question is can I use wav for CTC ? @tlikhomanenko
Yep, wav format is supported, feel free to use it (for example TIMIT recipe with wav files)
I solved Loss has NaN values issue by reducing lr to 0.001 @Rootian link for reference: https://github.com/facebookresearch/wav2letter/issues/334
Thanks @tlikhomanenko I will try training using ctc criterion for wav file