nanoGPT
nanoGPT copied to clipboard
Why don't we crop attn.weight as well?
The crop_block_size method in the GPT class only crop wpe.weight and attn.bias. Don't we also need to crop attn.weight, whose shape is (BTC) and T is block_size?
attn.weight(projection matrix) is independent of T(or block_size).