[Feature] VSA with larger block size and GQA support

Open winke88 opened this issue 5 months ago • 1 comments

Motivation

Appreciate the amazing work of VSA techniques! It seems that VSA only supports block_size = (64, 64) with hardcode. https://github.com/hao-ai-lab/FastVideo/blob/0aef0e6f6307c6aabeb9774326ec4c2631170a94/csrc/attn/vsa/block_sparse_h100.cu#L11

And the current code does not have a test or benchmark for GQA support.

I want to use larger block size like 128 to obtain better MFU and use VSA for GQA models, how can I support it?

Related resources

No response

Jul 23 '25 07:07 winke88

@jzhang38 can comment?

Aug 10 '25 04:08 zhisbug