D S Pavan Kumar
D S Pavan Kumar
Use `feat-to-dim` from Kaldi. This will usually be 13 dimensions (MFCCs). We also add delta and acceleration coefficients (`add-deltas`) in the script, which makes it 13\*3=39. Then we concatenate 11...
No, the dimension of the frame is independent of the number of frames in an utterance. Typically no two audio files will have the same number of frames. However the...
Typically sampling rate is same across the dataset. Even if the sampling rate is different, we could extract the same number of cepstra (or any other features) that form frames,...
I've always gotten error when the dimension mismatched, precisely when the empty array of size `self.inputFeatDim` is appended with the received data: `self.x = numpy.concatenate ((self.x[self.batchPointer:], x))` I don't think...
Thanks for the bug. Just remove the line to fix it. I updated the repository.
Sorry for being so late, the script should work with Tensorflow 1.x. Are you using Tensorflow 2.0?
Hello, It depends on multiple factors such as language and the dialect (e.g. en-GB, de-CH, fr-CA) in which you are trying to recognise, the nature of speech (because a model...
You can train using any directory such as tri1, tri2a, tri2b, sat, sgmm2 etc. The training script uses alignments from the provided existing model. You need to call the appropriate...
It usually occurs if you do not have read/write permissions to the file it is trying to access, or if you are trying to open a file that has the...