How does RadixAttention implements multi-head/multi-query/grouped-query attention.

Open Griffintaur opened this issue 1 year ago • 0 comments

how does radix-attention function call need to be modified in sglang for a model implemented in vllm where paged attention takes care of multi-query and grouped-query architecture

Apr 29 '24 14:04 Griffintaur