llama.cpp Bug: Qwen2-72B-Instruct (and finetunes) Q4_K_M, Q5_K_M generates random output with CuBLAS prompt processing

Bug: Qwen2-72B-Instruct (and finetunes) Q4_K_M, Q5_K_M generates random output with CuBLAS prompt processing

Open anunknowperson opened this issue 8 months ago • 25 comments

Qwen2-72B-Instruct Q4_K_M generates output with random tokens (numbers, special symbols, random chunks of words from different languages, etc).

Has been tested on:

Other people say it works with Q6, maybe the problem is with Q4_K_M (i can't test q6).

I've tried with both FlashAttention on and off and MMQ on and off, doesn't work.

I tested with llama.cpp binaries, koboldcpp, text-generation-webui - doesn't work everywhere.

related: https://github.com/LostRuins/koboldcpp/issues/909

version: 3181 (37bef894) built with MSVC 19.29.30154.0 for x64

Windows

No response

Jun 20 '24 03:06 anunknowperson