FastVideo
FastVideo copied to clipboard
[Feature] VSA with larger block size and GQA support
Motivation
Appreciate the amazing work of VSA techniques! It seems that VSA only supports block_size = (64, 64) with hardcode. https://github.com/hao-ai-lab/FastVideo/blob/0aef0e6f6307c6aabeb9774326ec4c2631170a94/csrc/attn/vsa/block_sparse_h100.cu#L11
And the current code does not have a test or benchmark for GQA support.
I want to use larger block size like 128 to obtain better MFU and use VSA for GQA models, how can I support it?
Related resources
No response
@jzhang38 can comment?