mlc-llm
mlc-llm copied to clipboard
Question about flashinfer constraints
Hi,
I've been exploring the flashinfer implementation and noticed some constraints in dispatch_kv_cache_creation.py:
https://github.com/mlc-ai/mlc-llm/blob/b636b2ac5e0c8bac6cf2a5427c3380fff856447e/python/mlc_llm/compiler_pass/dispatch_kv_cache_creation.py#L200-L221
Could you help me understand:
- The technical rationale behind these limitations (head_dim, group size) ?
- Whether mlc-llm with flashinfer could support other configurations (e.g., head_dim=64 or group size = 6 ) ? I'm trying to temporarily comment out these conditions locally.
I'd greatly appreciate any insights or documentation references to better understand these boundaries. Thanks for your time and for maintaining this valuable project.