nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Why don't we crop attn.weight as well?

Open muerghq opened this issue 11 months ago • 1 comments

The crop_block_size method in the GPT class only crop wpe.weight and attn.bias. Don't we also need to crop attn.weight, whose shape is (BTC) and T is block_size?

image

muerghq avatar Mar 05 '24 03:03 muerghq

attn.weight(projection matrix) is independent of T(or block_size).

JJJYmmm avatar Apr 05 '24 11:04 JJJYmmm