ng-video-lecture icon indicating copy to clipboard operation
ng-video-lecture copied to clipboard

Shouldn't we be dividing when normalizing QK^T, not multiplying?

Open tylerkastner opened this issue 5 months ago • 0 comments

In the code below, the query-key dot product is normalized by multiplying by the square root of the head size: https://github.com/karpathy/ng-video-lecture/blob/52201428ed7b46804849dea0b3ccf0de9df1a5c3/gpt.py#L83 Should we not be dividing instead? As seen in the original paper: Screenshot 2024-09-19 at 2 26 20 PM

tylerkastner avatar Sep 19 '24 18:09 tylerkastner