tvm
tvm copied to clipboard
[Tracking Issue] Need support for GQA Attention in Relax
This feature is critical for modern LLM compilation workflows and is currently not available in Relax. Adding native support for GQA in Relax will enable better performance and compatibility with transformer-based models exported from PyTorch, HuggingFace, or ONNX formats.
https://arxiv.org/abs/2305.13245
Can I get assigned on this please if its open to community contributions?
@hamzaqureshi5 Absolutely you can!