snowfall
snowfall copied to clipboard
Bug in decoder_padding_mask in BPE training
See the code below: https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L162
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L167
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L179
https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L709 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L720-L721
You can see that ys_in_pad
is padded with eos_id, which is a positive word piece ID.
However, it is using -1 to compute the mask for ys_in_pad
.
This bug may explain why the WERs differ with respect to batch size. It also affects the training, I guess.