vanilla
vanilla
I have the same issue. Why loss was calculated on all tokens?
@EmaadKhwaja` return logits[~mask], target[~mask]` seems a bit problematic, we should calculate masked token loss `return logits[mask], target[mask]`
> @xuesongnie it's because the mask calculated is applied to the wrong values. The other option would be to do `r = math.floor(1-self.gamma(np.random.uniform()) * z_indices.shape[1])`, but I don't like that...
When the code will be released?