keras-kaldi
keras-kaldi copied to clipboard
changed the feature stream pipelines
Hi, Kumar,
I have found that the feature stream in Kaldi seemed to do "add-deltas..." first, and then do "splice-feats ...", so I did a small fix for steps_kt/dataGenerator.py and steps_kt/decode.sh on the feature pipelines from "splice-feats... | add-deltas..." to "add-deltas... | splice-feats...".
I have tested this modification on different corpora, and the performances could be improved in most cases.
Thanks, kevin yang
Hello Kevin,
Thank you for the suggestion. Can you comment on how big the WER difference is, and on how many hours of data the training setup used?
When I wrote the code I followed CMU's Kaldi+PDNN setup as the reference, which did splicing first and then adding deltas: https://github.com/yajiemiao/kaldipdnn/blob/master/steps_pdnn/build_nnet_pfile.sh
Thanks, Pavan.
Hi, Kumar,
I tested this on TIMIT and MATBN (about 40 hours). The alignments were determined by Viterbi alignment on the training data following the Kaldi's recipe.
- The PER improvements were ~0.3% on TIMIT with MFCC (monophone alignments), but had relatively large improvements (~1%) with fMLLR (LDA+MLLT+SAT alignments).
- The WER improvements were ~0.2% with FBANK+Pitch (LDA+MLLT+SAT alignments), ~0.4% with MFCC (LDA+MLLT+SAT alignments) on MATBN.
However, I have also noticed that the number of pdfs(or alignments) on TIMIT & WSJ are different between your settings and mine. (e.g. My TIMIT monophone GMM have 144 pdfs, 2012 pdfs in LDA+MLLT which generated from the Kaldi's recipe) I am curious that perhaps this reason cause the improvements when I use this feature pipelines. Maybe you can try this feature stream on your experiment setting :)
On the other hand, I followed the Kaldi nnet1 recipe: https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/steps/nnet/train.sh (line 227: Add deltas; line 250: Set $feat_dim; line 255: Make default proto with splice)
This script appended "add-deltas" into $feats_tr (or $feats_cv), and set current feature dimension to $feat_dim at line 250.
The splice part were determined at line 255 in feature_transform_proto which contains the splice InputDim, OutputDim:
<Splice> <InputDim> $feat_dim <OutputDim> $(((2*splice+1)*feat_dim)) ...