keras-kaldi icon indicating copy to clipboard operation
keras-kaldi copied to clipboard

changed the feature stream pipelines

Open mhy-kevin-dev opened this issue 7 years ago • 2 comments

Hi, Kumar,

I have found that the feature stream in Kaldi seemed to do "add-deltas..." first, and then do "splice-feats ...", so I did a small fix for steps_kt/dataGenerator.py and steps_kt/decode.sh on the feature pipelines from "splice-feats... | add-deltas..." to "add-deltas... | splice-feats...".

I have tested this modification on different corpora, and the performances could be improved in most cases.

Thanks, kevin yang

mhy-kevin-dev avatar Sep 27 '17 06:09 mhy-kevin-dev

Hello Kevin,

Thank you for the suggestion. Can you comment on how big the WER difference is, and on how many hours of data the training setup used?

When I wrote the code I followed CMU's Kaldi+PDNN setup as the reference, which did splicing first and then adding deltas: https://github.com/yajiemiao/kaldipdnn/blob/master/steps_pdnn/build_nnet_pfile.sh

Thanks, Pavan.

dspavankumar avatar Sep 28 '17 10:09 dspavankumar

Hi, Kumar,

I tested this on TIMIT and MATBN (about 40 hours). The alignments were determined by Viterbi alignment on the training data following the Kaldi's recipe.

  • The PER improvements were ~0.3% on TIMIT with MFCC (monophone alignments), but had relatively large improvements (~1%) with fMLLR (LDA+MLLT+SAT alignments).
  • The WER improvements were ~0.2% with FBANK+Pitch (LDA+MLLT+SAT alignments), ~0.4% with MFCC (LDA+MLLT+SAT alignments) on MATBN.

However, I have also noticed that the number of pdfs(or alignments) on TIMIT & WSJ are different between your settings and mine. (e.g. My TIMIT monophone GMM have 144 pdfs, 2012 pdfs in LDA+MLLT which generated from the Kaldi's recipe) I am curious that perhaps this reason cause the improvements when I use this feature pipelines. Maybe you can try this feature stream on your experiment setting :)

On the other hand, I followed the Kaldi nnet1 recipe: https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/steps/nnet/train.sh (line 227: Add deltas; line 250: Set $feat_dim; line 255: Make default proto with splice)

This script appended "add-deltas" into $feats_tr (or $feats_cv), and set current feature dimension to $feat_dim at line 250. The splice part were determined at line 255 in feature_transform_proto which contains the splice InputDim, OutputDim: <Splice> <InputDim> $feat_dim <OutputDim> $(((2*splice+1)*feat_dim)) ...

mhy-kevin-dev avatar Oct 03 '17 01:10 mhy-kevin-dev