CTC-OptimizedLoss
CTC-OptimizedLoss copied to clipboard
why mwer use stop gradient?
why mwer use stop gradient? just a regularization?
why mwer use stop gradient? just a regularization?
May be Variance reduction
i find tf ctc beam search will loss the gradients
i find tf ctc beam search will loss the gradients
Beam search is just to find candidate paths, gradient is not required in beam search. Gradients are pushed back to logit weight since there are probability P which is computed from logit as input to MWER loss. NBEST path from CTC Beam search can actually be generated offline to speed up training.
