llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Eval bug: Qwen3 Q4_0 not working with SYCL

Open invent00 opened this issue 7 months ago • 7 comments

Name and Version

version: 5215(5f5e39e1) built with MSVC 19343.34808.0

Operating systems

Windows

GGML backends

SYCL

Hardware

Core Ultra 5 125U 32GB mem(ThinkPad X1 Carbon Gen12) Driver Version: 32.0.101.6739

Models

Qwen3-4B-gguf Q4_0 (https://huggingface.co/unsloth/Qwen3-4B-GGUF/tree/main)

Problem description & steps to reproduce

When attempting inference with the model, the screen briefly goes black and fails to function properly. However, the Q4_K_M model operates normally.

in addition, cuda build (cu11.7,b5215) work properly with Q4_0.

how to reproduce:

  1. llama-cli.exe -ngl 99 -m Qwen3-4B-Q4_0.gguf
  2. input question
  3. Black out occur

in event log, llama-cli.exe shows following application error Image

First Bad Commit

No response

Relevant log output

1. llama-cli.exe -ngl 99 -m Qwen3-4B-Q4_0.gguf
2. input question
3. Black out occur

invent00 avatar Apr 29 '25 02:04 invent00

I've also had issues with Q4_0 quants on SYCL resulting in the screen going black and crashing for my Arc A770M. I experienced this on Gemma 3 12B QAT as well as Llama 2 7B when running performance benchmarks. I believe the SYCL Q4_0 reorder optimizations resulted in this as setting GGML_SYCL_DISABLE_OPT=1 allowed things to run normally again.

Sketchfellow avatar Apr 29 '25 03:04 Sketchfellow

I believe the SYCL Q4_0 reorder optimizations resulted in this as setting GGML_SYCL_DISABLE_OPT=1 allowed things to run normally again

cc @Rbiessy @NeoZhangJianyu @Alcpz ^

qnixsynapse avatar Apr 29 '25 03:04 qnixsynapse

Hi @Sketchfellow, Thank you for your advice. after set GGML_SYCL_DISABLE_OPT=1 , it works properly.

invent00 avatar Apr 29 '25 05:04 invent00

I've been able to reproduce the issue, but only on Windows. Linux seems unaffected. As reported, GGML_SYCL_DISABLE_OPT=1 works without problem. There seems to be something wrong with the reorder, but I would need to have a deeper look at it.

Alcpz avatar Apr 30 '25 14:04 Alcpz

Let me check!

NeoZhangJianyu avatar May 05 '25 06:05 NeoZhangJianyu

@invent00 https://github.com/ggml-org/llama.cpp/pull/13109 should fix this issue. Could you check if it works for you? :) Note that you will need to set/export the environment variable GGML_SYCL_DISABLE_OPT=0 to trigger the reorder codepath which was causing the issue.

sgeor255 avatar May 06 '25 14:05 sgeor255

@sgeor255 Hi, I builded d7e5179 and tried. it works properly with GGML_SYCL_DISABLE_OPT=0 + Qwen3-4B-Q4_0.gguf.

I confirmed GGML_SYCL_DISABLE_OPT=0 is faster than GGML_SYCL_DISABLE_OPT=1.

Once this is merged into main, I will close this issue.

invent00 avatar May 08 '25 14:05 invent00

Hi,

I confirmed works properly on version:5402 (0a338ed0) with GGML_SYCL_DISABLE_OPT=0 also works properly.

Let me close issue. Thank you for your support.

invent00 avatar May 16 '25 12:05 invent00