icefall
icefall copied to clipboard
[Transducer Loss] Why not normalize transducer loss
Can you explain why you do not normalize transducer loss. And if I increase batch size, it will make gradients to be larger, can model converges.