Tatiana Likhomanenko

Results 242 comments of Tatiana Likhomanenko
trafficstars

Could you provide example to reproduce?

Hey! Here https://github.com/flashlight/flashlight/blob/master/flashlight/app/asr/Decode.cpp#L641 you will have per frame token indices in the rawTokenPrediction, so you can do any postprocessing and print computed word timings here. The only thing to have...

Well, I can navigate only for Decode.cpp (not the online inference if you are referring to it). The qq I have before going further: what are values for flags ```...

So you have from https://github.com/flashlight/flashlight/blob/master/flashlight/app/asr/Decode.cpp#L641 `rawTokenPrediction` array of token indices for each frame. Then in the loop over this array you call ``` std::array tokens; for (auto index : rawTokenPrediction)...

Well, "#" means CTC blank token. Also if I remember correctly (https://github.com/flashlight/flashlight/blob/master/flashlight/lib/text/decoder/LexiconDecoder.cpp#L257, https://github.com/flashlight/flashlight/blob/master/flashlight/lib/text/decoder/LexiconDecoder.cpp#L27), you need to remove first and last silence tokens as we add them artificially during decoding. Then...

> * Please correct me, if my calculations are wrong. > * Yep, it looks correct to me. Again, total duration after removing first and last frame now looks correct....

Yep, correct, stride of the arch is 1 and data preprocessing is 10ms, so frame after network corresponds to 10ms audio. > * How to know the stride value from...

We don't support this training right now, and we don't have this in our agenda for this half. And pull requests are always welcome, so feel free to add this...

you could post link to the code / code itself directly here, probably someone from us or from community will have a look at it and could help you.

cc @vineelpratap