snowfall
snowfall copied to clipboard
[WIP] Implement 2nd pass training using 1-best decoding results from the 1st pass network
It implements https://github.com/k2-fsa/snowfall/pull/106#issuecomment-803796177
BTW, since it seems this is hard to get to work, if you feel like it you could work on a simpler idea. In training time we'd take the best-path alignment and using some kind of RNN or masked attention we'd predict the next label in that best-path. (We'd probably take in the output of the 1st network as an input to that). The sequence length here is the same as the original sequence length. In test time the way this would work at least initially, is we'd run this on n-best lists obtained from the 1st-pass decoding and use the scores to decide which of the n-best paths to keep. There are more accurate decoding methods we could look into later.
The training objf is decreasing and seems to be converging. Will post the decoding results later.