Eval bug: Qwen3 Q4_0 not working with SYCL
Name and Version
version: 5215(5f5e39e1) built with MSVC 19343.34808.0
Operating systems
Windows
GGML backends
SYCL
Hardware
Core Ultra 5 125U 32GB mem(ThinkPad X1 Carbon Gen12) Driver Version: 32.0.101.6739
Models
Qwen3-4B-gguf Q4_0 (https://huggingface.co/unsloth/Qwen3-4B-GGUF/tree/main)
Problem description & steps to reproduce
When attempting inference with the model, the screen briefly goes black and fails to function properly. However, the Q4_K_M model operates normally.
in addition, cuda build (cu11.7,b5215) work properly with Q4_0.
how to reproduce:
- llama-cli.exe -ngl 99 -m Qwen3-4B-Q4_0.gguf
- input question
- Black out occur
in event log, llama-cli.exe shows following application error
First Bad Commit
No response
Relevant log output
1. llama-cli.exe -ngl 99 -m Qwen3-4B-Q4_0.gguf
2. input question
3. Black out occur
I've also had issues with Q4_0 quants on SYCL resulting in the screen going black and crashing for my Arc A770M. I experienced this on Gemma 3 12B QAT as well as Llama 2 7B when running performance benchmarks. I believe the SYCL Q4_0 reorder optimizations resulted in this as setting GGML_SYCL_DISABLE_OPT=1 allowed things to run normally again.
I believe the SYCL Q4_0 reorder optimizations resulted in this as setting GGML_SYCL_DISABLE_OPT=1 allowed things to run normally again
cc @Rbiessy @NeoZhangJianyu @Alcpz ^
Hi @Sketchfellow,
Thank you for your advice. after set GGML_SYCL_DISABLE_OPT=1 , it works properly.
I've been able to reproduce the issue, but only on Windows. Linux seems unaffected. As reported, GGML_SYCL_DISABLE_OPT=1 works without problem. There seems to be something wrong with the reorder, but I would need to have a deeper look at it.
Let me check!
@invent00 https://github.com/ggml-org/llama.cpp/pull/13109 should fix this issue. Could you check if it works for you? :) Note that you will need to set/export the environment variable GGML_SYCL_DISABLE_OPT=0 to trigger the reorder codepath which was causing the issue.
@sgeor255 Hi, I builded d7e5179
and tried. it works properly with GGML_SYCL_DISABLE_OPT=0 + Qwen3-4B-Q4_0.gguf.
I confirmed GGML_SYCL_DISABLE_OPT=0 is faster than GGML_SYCL_DISABLE_OPT=1.
Once this is merged into main, I will close this issue.
Hi,
I confirmed works properly on version:5402 (0a338ed0)
with GGML_SYCL_DISABLE_OPT=0 also works properly.
Let me close issue. Thank you for your support.