Florian Metze

Results 92 comments of Florian Metze

Jinserk, we'd love nothing more than for you to test the TF branch as well. Yes, it uses TF's LSTM and CTC implementations. In theory, everything is there but most...

Sorry all - we have not released a full recipe for this yet. We will probably have one on the Babel corpus very soon, and will be able to release...

Hi, can we make it so that the corpus files are not included in the pull request? I would like to have the recipe included, but it seems wrong to...

Yes, cmvn can be sensitive to short utterances. You may want to smooth utterances, or have a sliding window - if your data supports that. We did some experiments with...

The sliding window should typically be a few seconds long, not? Then it just computes some local context and assumes that the speaker characteristics don’t change quickly. For talks or...

The error is probably caused by an inconsistency between your conf/*.proto file and the actual model. It seems that the prototype file has been generated with a prototype in the...

Eric, thanks for the flowers. The main problem is the use of the bi-directional LSTM as an acoustic model, which in theory requires you to have the while segment available...

Yes, that is by and large correct. The big challenge is speaker diarization, unless you only have one speaker in your audio channel. Imagine you have two speakers, a loud...

http://www.asru2015.org/Papers/ViewPapers.asp?PaperNum=1103 > On Aug 4, 2017, at 5:37 PM, ericbolo wrote: > > Ok, I now have a pretty good understanding of the diarization/speaker normalization issues, none of them insurmountable...

Right, the paper does not use CTC loss, but I don't think this would matter much, certainly not for the LSTMs, which is where we have the recurrent connections. CTC...