mlc-llm Question about flashinfer constraints

Question about flashinfer constraints

Open mayakolad opened this issue 8 months ago • 0 comments

Hi,

I've been exploring the flashinfer implementation and noticed some constraints in dispatch_kv_cache_creation.py:

https://github.com/mlc-ai/mlc-llm/blob/b636b2ac5e0c8bac6cf2a5427c3380fff856447e/python/mlc_llm/compiler_pass/dispatch_kv_cache_creation.py#L200-L221

Could you help me understand:

The technical rationale behind these limitations (head_dim, group size) ?
Whether mlc-llm with flashinfer could support other configurations (e.g., head_dim=64 or group size = 6 ) ? I'm trying to temporarily comment out these conditions locally.

I'd greatly appreciate any insights or documentation references to better understand these boundaries. Thanks for your time and for maintaining this valuable project.

Feb 25 '25 09:02 mayakolad

mlc-llm mlc-llm copied to clipboard

Question about flashinfer constraints

mlc-llm
mlc-llm copied to clipboard