ng-video-lecture icon indicating copy to clipboard operation
ng-video-lecture copied to clipboard

wei value not 100% per row after dropout

Open guyko81 opened this issue 2 years ago • 1 comments

It doesn't make sense to me, but

        wei = q @ k.transpose(-2,-1) * k.shape[-1]**-0.5 # (B, T, hs) @ (B, hs, T) -> (B, T, T)
        wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf')) # (B, T, T)
        wei = F.softmax(wei, dim=-1) # (B, T, T)

although after this step the row level percentages sum up to 100%, taking the dropout

        wei = self.dropout(wei)

the values increase above 100%. Any reason for that? Does it cause any issues? I mean the overall calculation shouldn't be effected too much, other parts of the network can overcome this issue, but still.

guyko81 avatar Sep 07 '23 22:09 guyko81

I'm running this as a Jupyter Notebook, so there might be some inconsistencies, but I don't appear to be getting this when I intercept wei and check its values. Can you maybe provide some more detail around this?

fasterinnerlooper avatar Feb 04 '24 06:02 fasterinnerlooper