gpt-fast
gpt-fast copied to clipboard
Update sdpa function with enable_gqa=True
For the llama model, in the sdpa function call, set enable_gqa=True to use the inbuilt grouped query attention functionality