Training Error when run tedlium recipe
Hi Alls:
I already install essen and I am trying to run the tedlium recipe, but I got the error:
train-ctc-parallel --report-step=1000 --num-sequence=20 --frame-limit=25000 --learn-rate=0.00004 --momentum=0.9 --verbose=1 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l5_c320/train.scp ark:- | add-deltas ark:- ark:- |' 'ark:gunzip -c exp/train_phn_l5_c320/labels.tr.gz|' exp/train_phn_l5_c320/nnet/nnet.iter0 exp/train_phn_l5_c320/nnet/nnet.iter1 WARNING (train-ctc-parallel:SelectGpuId():cuda-device.cc:150) Suggestion: use 'nvidia-smi -c 1' to set compute exclusive mode LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 4 GPUs LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): GeForce GTX 1080 Ti free:189M, used:10985M, total:11175M, free/total:0.0169513 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(1): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(2): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(3): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 1 (automatically) LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [1]: GeForce GTX 1080 Ti free:10983M, used:195M, total:11178M, free/total:0.982556 version 6.1 LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes. LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory.
ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>".
ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>".
[stack trace: ]
eesen::KaldiGetStackTraceabi:cxx11
eesen::KaldiErrorMessage::~KaldiErrorMessage()
eesen::ExpectToken(std::istream&, bool, char const*)
eesen::BiLstm::ReadData(std::istream&, bool)
eesen::Layer::Read(std::istream&, bool, bool)
.
.
.
eesen::Net::Read(std::istream&, bool)
eesen::Net::Read(std::__cxx11::basic_string<char, std::char_traits
How to fix it? Thank you!
The error is probably caused by an inconsistency between your conf/*.proto file and the actual model. It seems that the prototype file has been generated with a prototype in the librispeech recipe (or https://github.com/jb1999/eesen), while the actual code is srvk's Eesen?
Eesen's standard acoustic model does not contain the ForwardDropFactor, but jb1999's Eesen does.
I installed srvk's Eesen, not jb1999's Eesen.
Today i tried the librispeech recipe, the acoutsic model training process can run normally, the nnet proto is below:
<Nnet>
<BiLstmParallel> <InputDim> 360 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<AffineTransform> <InputDim> 640 <OutputDim> 44 <ParamRange> 0.1
<Softmax> <InputDim> 44 <OutputDim> 44
</Nnet>
but when i ran the tedlium recipe , the acoustic model training got the error i sent before. And the nnet proto now is:
<Nnet>
<BiLstmParallel> <InputDim> 120 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<AffineTransform> <InputDim> 640 <OutputDim> 78 <ParamRange> 0.1
<Softmax> <InputDim> 78 <OutputDim> 78
</Nnet>