mlc-llm [Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal

Open ifndefendif opened this issue 9 months ago • 6 comments

❓ General Questions

Hello, I encountered an issue while deploying using mlc_llm in cpp. The model is using Qwen2.5-0.5B. kv_cache is created using "creat_tir_cged_kv_cache". When performing a prefill, it was found that the calculation result of "paged_kv_cache.attention_with_fused_qkv" did not meet expectations. The input of qkv here is normal, and the dimensions of the output are [b, s, hq * d], of which approximately [b, s * 0.78 : , hq * d]The results are abnormal (testing tokens of different lengths for prefill all follow this rule), but there is a significant accuracy error in the subsequent results. What could be the reason? Thanks~

Jan 17 '25 03:01 ifndefendif

mlc-llm mlc-llm copied to clipboard

[Question]In the output results of attention_with_fused_qkv funcs, some slice accuracies are abnormal

❓ General Questions

mlc-llm
mlc-llm copied to clipboard