eesen Training Error when run tedlium recipe

Hi Alls:

I already install essen and I am trying to run the tedlium recipe, but I got the error:

train-ctc-parallel --report-step=1000 --num-sequence=20 --frame-limit=25000 --learn-rate=0.00004 --momentum=0.9 --verbose=1 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l5_c320/train.scp ark:- | add-deltas ark:- ark:- |' 'ark:gunzip -c exp/train_phn_l5_c320/labels.tr.gz|' exp/train_phn_l5_c320/nnet/nnet.iter0 exp/train_phn_l5_c320/nnet/nnet.iter1 WARNING (train-ctc-parallel:SelectGpuId():cuda-device.cc:150) Suggestion: use 'nvidia-smi -c 1' to set compute exclusive mode LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 4 GPUs LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): GeForce GTX 1080 Ti free:189M, used:10985M, total:11175M, free/total:0.0169513 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(1): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(2): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(3): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 1 (automatically) LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [1]: GeForce GTX 1080 Ti free:10983M, used:195M, total:11178M, free/total:0.982556 version 6.1 LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes. LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory.

ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>". ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>".

[stack trace: ] eesen::KaldiGetStackTraceabi:cxx11 eesen::KaldiErrorMessage::~KaldiErrorMessage() eesen::ExpectToken(std::istream&, bool, char const*) eesen::BiLstm::ReadData(std::istream&, bool) eesen::Layer::Read(std::istream&, bool, bool) . . . eesen::Net::Read(std::istream&, bool) eesen::Net::Read(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) train-ctc-parallel(main+0xbb1) [0x434345] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1fec83830] train-ctc-parallel(_start+0x29) [0x4320f9]

How to fix it? Thank you!

May 22 '19 01:05 yuexianghubit

The error is probably caused by an inconsistency between your conf/*.proto file and the actual model. It seems that the prototype file has been generated with a prototype in the librispeech recipe (or https://github.com/jb1999/eesen), while the actual code is srvk's Eesen?

Eesen's standard acoustic model does not contain the ForwardDropFactor, but jb1999's Eesen does.

May 23 '19 03:05 fmetze

I installed srvk's Eesen, not jb1999's Eesen. Today i tried the librispeech recipe, the acoutsic model training process can run normally, the nnet proto is below: <Nnet> <BiLstmParallel> <InputDim> 360 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T <BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T <BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T <BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T <AffineTransform> <InputDim> 640 <OutputDim> 44 <ParamRange> 0.1 <Softmax> <InputDim> 44 <OutputDim> 44 </Nnet>

but when i ran the tedlium recipe , the acoustic model training got the error i sent before. And the nnet proto now is: <Nnet> <BiLstmParallel> <InputDim> 120 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <AffineTransform> <InputDim> 640 <OutputDim> 78 <ParamRange> 0.1 <Softmax> <InputDim> 78 <OutputDim> 78 </Nnet>

May 23 '19 10:05 yuexianghubit