gpt-fast
gpt-fast copied to clipboard

Published 20 hours ago •

Reame
Issues

Naming: n_local_heads -> n_kv_heads

Open ad8e opened this issue 1 year ago • 0 comments

n_local_heads refers to TP sharding, rather than GQA.

Apr 23 '24 18:04 ad8e