Deepseek-r1:8b-0528-qwen3-fp16 cannot run on the IPex LLM 20250725 version.
Deepseek-r1:8b-0528-qwen3-fp16 can't run on the IPex LLM 20250725 version. I have also tried earlier versions, and they cannot run. The Gemma3 QAT version also cannot run.
Device Information: GPU:A770 16G RAM:16G
Logs are as follows:
G:\ollama-ipex-llm-2.3.0b20250725-win>ollama run deepseek-r1:8b-0528-qwen3-fp16 Error: llama runner process has terminated: error:CHECK_TRY_ERROR((*stream).memcpy((char *) tensor->data + offset, data, size).wait()): Exception caught in this line of code. in function ggml_backend_sycl_buffer_set_tensor at D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\ollama-llama-cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:432 D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\ollama-llama-cpp\ggml\src\ggml-sycl..\ggml-sycl\common.hpp:115: SYCL error
Hi @LIwii1 I think this error is caused by out of GPU memory. Deepseek-r1:8b-0528-qwen3-fp16 model itself requires 16GB. You could try other precision instead.
Hi @LIwii1 I think this error is caused by out of GPU memory. Deepseek-r1:8b-0528-qwen3-fp16 model itself requires 16GB. You could try other precision instead.
deepseek-r1:8b-llama-distill-fp16 can run on earlier versions. Such as 20250220
Hi @LIwii1 I think this error is caused by out of GPU memory. Deepseek-r1:8b-0528-qwen3-fp16 model itself requires 16GB. You could try other precision instead.
deepseek-r1:8b-llama-distill-fp16 can run on earlier versions. Such as 20250220
Different OLLAMA versions and driver versions will consume a slightly different amount of GPU memory for the same model. However, deepseek-r1:8b-llama-distill-fp16 will always utilize approximately 16GB of GPU memory. It will easily run out of memory on A770 when the input is longer.
Can you try int4 or int8 low precision, or other low memory features?
Deepseek-r1:8b-0528-qwen3-q4_K_M can be run, but the model performance drops significantly, so we should try the FP16 version. In addition, the GEMMA3 12B QAT version cannot be run.
Deepseek-r1:8b-0528-qwen3-q4_K_M can be run, but the model performance drops significantly, so we should try the FP16 version. In addition, the GEMMA3 12B QAT version cannot be run.
What about https://ollama.com/library/deepseek-r1:8b-0528-qwen3-q8_0 or Q6K