bethant9
bethant9
Sorry, I should have clarified. I am training the model with a larger right context length (and then decoding with the same values)
32 chunk size + 8 right context length (default) vs 32 chunk size + 32 right context length The second experiment gives worse results so far
I tried with right context length (RCL) = 16 - it was better than RCL=32, but still worse than RCL=8. It seems from my rough experiments that increasing RCL degrades...
Just wondering, is the training input padded with the right_context_length?
Hi, I found that I needed to pad the training data with right context length frames - otherwise during training right context length frames are removed from the input which...
Yes exactly, ends of utterances aren't correctly trained so high deletion error
I solved this by padding with right context length just before the emformer
No, happy for you to do that if you want?