mlc-llm
mlc-llm copied to clipboard
[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal
❓ General Questions
Hello, I encountered an issue while deploying using mlc_llm in cpp.
The model is using Qwen2.5-0.5B.
kv_cache is created using "creat_tir_cged_kv_cache".
When performing a prefill, it was found that the calculation result of "paged_kv_cache.attention_with_fused_qkv" did not meet expectations.
The input of qkv here is normal, and the dimensions of the output are [b, s, hq * d], of which approximately [b, s * 0.78 : , hq * d]The results are abnormal (testing tokens of different lengths for prefill all follow this rule), but there is a significant accuracy error in the subsequent results.
What could be the reason?
Thanks~