Hongqiang Guo
Results
2
issues of
Hongqiang Guo
@karpathy First of all, thank you so much for sharing your knowledge. I updated the initialization of self.vocab because I don't feel we need to call self._build_vocab(). I also cleaned...
The crop_block_size method in the GPT class only crop wpe.weight and attn.bias. Don't we also need to crop attn.weight, whose shape is (B*T*C) and T is block_size? 