gpt-fast
gpt-fast copied to clipboard

Published 20 hours ago •

Reame
Issues

Update sdpa function with enable_gqa=True

Open jainapurva opened this issue 1 year ago • 1 comments

For the llama model, in the sdpa function call, set enable_gqa=True to use the inbuilt grouped query attention functionality

Jul 13 '24 03:07 jainapurva