mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal

Open ifndefendif opened this issue 9 months ago • 6 comments

❓ General Questions

Hello, I encountered an issue while deploying using mlc_llm in cpp. The model is using Qwen2.5-0.5B. kv_cache is created using "creat_tir_cged_kv_cache". When performing a prefill, it was found that the calculation result of "paged_kv_cache.attention_with_fused_qkv" did not meet expectations. The input of qkv here is normal, and the dimensions of the output are [b, s, hq * d], of which approximately [b, s * 0.78 : , hq * d]The results are abnormal (testing tokens of different lengths for prefill all follow this rule), but there is a significant accuracy error in the subsequent results. What could be the reason? Thanks~ Image

ifndefendif avatar Jan 17 '25 03:01 ifndefendif