Florian Metze comments

Results 92 comments of


                                            Florian Metze

tensorflow example?

Jinserk, we'd love nothing more than for you to test the TF branch as well. Yes, it uses TF's LSTM and CTC implementations. In theory, everything is there but most...

tensorflow example?

Sorry all - we have not released a full recipe for this yet. We will probably have one on the Babel corpus very soon, and will be able to release...

Eesen for thchs30 recipe

Hi, can we make it so that the corpus files are not included in the pull request? I would like to have the recipe included, but it seems wrong to...

Importance of utterance lengths

Yes, cmvn can be sensitive to short utterances. You may want to smooth utterances, or have a sliding window - if your data supports that. We did some experiments with...

Importance of utterance lengths

The sliding window should typically be a few seconds long, not? Then it just computes some local context and assumes that the speaker characteristics don’t change quickly. For talks or...

Training Error when run tedlium recipe

The error is probably caused by an inconsistency between your conf/*.proto file and the actual model. It seems that the prototype file has been generated with a prototype in the...

Real-time decoding

Eric, thanks for the flowers. The main problem is the use of the bi-directional LSTM as an acoustic model, which in theory requires you to have the while segment available...

Real-time decoding

Yes, that is by and large correct. The big challenge is speaker diarization, unless you only have one speaker in your audio channel. Imagine you have two speakers, a loud...

http://www.asru2015.org/Papers/ViewPapers.asp?PaperNum=1103 > On Aug 4, 2017, at 5:37 PM, ericbolo wrote: > > Ok, I now have a pretty good understanding of the diarization/speaker normalization issues, none of them insurmountable...

Real-time decoding

Right, the paper does not use CTC loss, but I don't think this would matter much, certainly not for the LSTMs, which is where we have the recurrent connections. CTC...