sglang
sglang copied to clipboard
How does RadixAttention implements multi-head/multi-query/grouped-query attention.
how does radix-attention function call need to be modified in sglang for a model implemented in vllm where paged attention takes care of multi-query and grouped-query architecture