vanilla

Results 4 comments of vanilla

I have the same issue. Why loss was calculated on all tokens?

@EmaadKhwaja` return logits[~mask], target[~mask]` seems a bit problematic, we should calculate masked token loss `return logits[mask], target[mask]`

> @xuesongnie it's because the mask calculated is applied to the wrong values. The other option would be to do `r = math.floor(1-self.gamma(np.random.uniform()) * z_indices.shape[1])`, but I don't like that...

When the code will be released?