self-debiasing
self-debiasing copied to clipboard
Add _apply_decay_mask_logits function to avoid numerical instability
Thanks for the comment.
I think we can keep both versions. If there is no numerical instability issue, we can of course use _apply_decay_mask
. Also, if you have time, you can run both versions and have a comparison between these two implementations. I suppose these two versions should not differ much.