ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Deepseek-r1:8b-0528-qwen3-fp16 cannot run on the IPex LLM 20250725 version.

Open LIwii1 opened this issue 5 months ago • 5 comments

Deepseek-r1:8b-0528-qwen3-fp16 can't run on the IPex LLM 20250725 version. I have also tried earlier versions, and they cannot run. The Gemma3 QAT version also cannot run.

Device Information: GPU:A770 16G RAM:16G

Logs are as follows:

G:\ollama-ipex-llm-2.3.0b20250725-win>ollama run deepseek-r1:8b-0528-qwen3-fp16 Error: llama runner process has terminated: error:CHECK_TRY_ERROR((*stream).memcpy((char *) tensor->data + offset, data, size).wait()): Exception caught in this line of code. in function ggml_backend_sycl_buffer_set_tensor at D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\ollama-llama-cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:432 D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\ollama-llama-cpp\ggml\src\ggml-sycl..\ggml-sycl\common.hpp:115: SYCL error

LIwii1 avatar Jul 29 '25 16:07 LIwii1

Hi @LIwii1 I think this error is caused by out of GPU memory. Deepseek-r1:8b-0528-qwen3-fp16 model itself requires 16GB. You could try other precision instead.

rnwang04 avatar Jul 30 '25 01:07 rnwang04

Hi @LIwii1 I think this error is caused by out of GPU memory. Deepseek-r1:8b-0528-qwen3-fp16 model itself requires 16GB. You could try other precision instead.

deepseek-r1:8b-llama-distill-fp16 can run on earlier versions. Such as 20250220

LIwii1 avatar Jul 30 '25 05:07 LIwii1

Hi @LIwii1 I think this error is caused by out of GPU memory. Deepseek-r1:8b-0528-qwen3-fp16 model itself requires 16GB. You could try other precision instead.

deepseek-r1:8b-llama-distill-fp16 can run on earlier versions. Such as 20250220

Different OLLAMA versions and driver versions will consume a slightly different amount of GPU memory for the same model. However, deepseek-r1:8b-llama-distill-fp16 will always utilize approximately 16GB of GPU memory. It will easily run out of memory on A770 when the input is longer.

Can you try int4 or int8 low precision, or other low memory features?

qiyuangong avatar Jul 31 '25 00:07 qiyuangong

Deepseek-r1:8b-0528-qwen3-q4_K_M can be run, but the model performance drops significantly, so we should try the FP16 version. In addition, the GEMMA3 12B QAT version cannot be run.

LIwii1 avatar Aug 02 '25 21:08 LIwii1

Deepseek-r1:8b-0528-qwen3-q4_K_M can be run, but the model performance drops significantly, so we should try the FP16 version. In addition, the GEMMA3 12B QAT version cannot be run.

What about https://ollama.com/library/deepseek-r1:8b-0528-qwen3-q8_0 or Q6K

qiyuangong avatar Aug 03 '25 13:08 qiyuangong