snowfall Bug in decoder_padding_mask in BPE training

Bug in decoder_padding_mask in BPE training

Open csukuangfj opened this issue 4 years ago • 0 comments

See the code below: https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L162

https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L167

https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L179

https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L709 https://github.com/k2-fsa/snowfall/blob/350253144af04c295f560cdb976f817dc13b2993/snowfall/models/transformer.py#L720-L721

You can see that ys_in_pad is padded with eos_id, which is a positive word piece ID.

However, it is using -1 to compute the mask for ys_in_pad.

This bug may explain why the WERs differ with respect to batch size. It also affects the training, I guess.

Aug 03 '21 02:08 csukuangfj

snowfall snowfall copied to clipboard

Bug in decoder_padding_mask in BPE training

snowfall
snowfall copied to clipboard