gpt-fast
gpt-fast copied to clipboard
Naming: n_local_heads -> n_kv_heads
n_local_heads refers to TP sharding, rather than GQA.