ipex-llm Fail to run gemma3 in the IPEX Ollama released in 3.19

Describe the bug Fail to run gemma3 in the IPEX Ollama released in 3.19, the latest documentation is strickly followed, updated to the latest version, and configured the environment variables 'set IPEX_LLM_MODEL_SOURCE=modelscope' that gemma3 needed.

While I tried phi4 on it, and it was running well, and much faster than by llama.cpp

How to reproduce Steps to reproduce the error:

Here the steps on terminal

PS C:\Soft\AI\ollama-ipex> .\start-ollama.bat PS C:\Soft\AI\ollama-ipex> set IPEX_LLM_MODEL_SOURCE=modelscope PS C:\Soft\AI\ollama-ipex> .\ollama.exe run gemma3:27b ggml_sycl_init: found 1 SYCL devices: pulling manifest pulling 6a2cf0085006... 100% ▕████████████████████████████████████████████████████████▏ 16 GB pulling e0a42594d802... 100% ▕████████████████████████████████████████████████████████▏ 358 B pulling d80b22736c51... 100% ▕████████████████████████████████████████████████████████▏ 47 B pulling dd084c7d92a3... 100% ▕████████████████████████████████████████████████████████▏ 8.4 KB pulling 54cb61c842fe... 100% ▕████████████████████████████████████████████████████████▏ 857 MB pulling 556c7538d4d4... 100% ▕████████████████████████████████████████████████████████▏ 195 B verifying sha256 digest writing manifest success Error: llama runner process has terminated: error:CHECK_TRY_ERROR(ctx->stream->memset( (char *)tensor->data + original_size, 0, padded_size - original_size).wait()): Meet error in this line code! in function ggml_backend_sycl_buffer_init_tensor at D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\ollama-llama-cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:320

Environment information windows 11, ultra7 258v, only has a C drive and no partitions（maybe the reason?）.

Mar 20 '25 04:03 larria

I Have the same problem on Linux with Phi4 and Gemma3 - I fear the ollama version needs to be updated!

Thanks.

Mar 20 '25 14:03 chbacher

Could you try running gemma3:4b to confirm whether it's related to VRAM?

Mar 21 '25 02:03 sgwhat

Loading the phi4-mini results in

Error: llama runner process has terminated: error loading model: missing tensor 'output.weight' llama_load_model_from_file: failed to load model

llama_model_load: error loading model: missing tensor 'output.weight' llama_load_model_from_file: failed to load model panic: unable to load model:

This error is due to outdated ollama Version 0.5.4 and has been resolved!

Gemma 3 stops with error:

Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

Mar 21 '25 15:03 chbacher

I have seen a similar error several times while running qwen2.5-coder:32b-instruct-q5_K_M.

ollama version is 0.5.4-ipexllm-20250228

Error: an error was encountered while running the model: error:CHECK_TRY_ERROR((*stream).memcpy((char *)tensor->data + offset, host_buf, size) .wait()): Meet error in this line code!
  in function ggml_backend_sycl_buffer_set_tensor at D:\actions-runner\release-cpp-oneapi_2024_2\_work\llm.cpp\llm.cpp\ollama-llama-cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:341
D:\actions-runner\release-cpp-oneapi_2024_2\_work\llm.cpp\llm.cpp\ollama-llama-cpp\ggml\src\ggml-sycl\..\ggml-sycl\common.hpp:107: SYCL error

Mar 22 '25 14:03 tsobczynski

Could you try running gemma3:4b to confirm whether it's related to VRAM?

Tried gemma3:4b, also failed, but the error information is different:

PS C:\Soft\AI\ollama-ipex> .\start-ollama.bat PS C:\Soft\AI\ollama-ipex> set IPEX_LLM_MODEL_SOURCE=modelscope PS C:\Soft\AI\ollama-ipex> .\ollama.exe run gemma3:4b ggml_sycl_init: found 1 SYCL devices: pulling manifest pulling be49949e4842... 100% ▕████████████████████████████████████████████████████████▏ 2.5 GB pulling e0a42594d802... 100% ▕████████████████████████████████████████████████████████▏ 358 B pulling d80b22736c51... 100% ▕████████████████████████████████████████████████████████▏ 47 B pulling dd084c7d92a3... 100% ▕████████████████████████████████████████████████████████▏ 8.4 KB pulling 8c0fb064b019... 100% ▕████████████████████████████████████████████████████████▏ 851 MB pulling b90a6002a6e9... 100% ▕████████████████████████████████████████████████████████▏ 194 B verifying sha256 digest writing manifest success Error: llama runner process has terminated: exit status 2

Mar 23 '25 12:03 larria

ggml_sycl_init: found 1 SYCL devices: Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

Mar 24 '25 06:03 xinyuxinyuan1

This won't work until Intel updates Ollama.

Mar 25 '25 15:03 wltechblog

Hi all, we are working on upgrading ipex-llm ollama version into 0.6.2 to re-support gemma3. Before that, you may run gemma3:1b. For more detailes, please see https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md.

Mar 26 '25 01:03 sgwhat

Hi all, we are working on upgrading ipex-llm ollama version into 0.6.2 to re-support gemma3. Before that, you may run gemma3:1b. For more detailes, please see https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md.

Thanks and good job!

Mar 26 '25 02:03 larria

After export IPEX_LLM_MODEL_SOURCE=modelscope, ollama run gemma3:1b "what’s worth more, three quarters or 100 pennies." can run normally.

Buy ./ollama run gemma3:12b "what’s worth more, three quarters or 100 pennies." got following error messages.

ggml_sycl_init: found 2 SYCL devices: pulling manifest pulling 9610e3e07375... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████▏ 7.3 GB pulling e0a42594d802... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████▏ 358 B pulling 40b6b1a43dcd... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████▏ 73 B pulling dd084c7d92a3... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████▏ 8.4 KB pulling 30c02d056410... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████▏ 854 MB pulling c547fa61e23d... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████▏ 195 B verifying sha256 digest writing manifest success Error: llama runner process has terminated: exit status 2

Mar 28 '25 08:03 chungyehwangai

@chungyehwangai , we are working on upgrading ipex-llm ollama into v0.6.x, which could support gemma3:4b/12b/27b, we should be able to release it soon.

Mar 31 '25 01:03 sgwhat

Has this been fixed in the latest IPEX-LLM v2.2.0 release?

Apr 08 '25 15:04 tristan-k

Still fails to run gemma3:{4b,12b} with ollama-ipex-llm-2.2.0-ubuntu

[Error Log]

[1] ollama-ipex-llm-2.2.0-ubuntu$ ./ollama run gemma3 ggml_sycl_init: found 1 SYCL devices: pulling manifest Error: pull model manifest: 412:

The model you are attempting to pull requires a newer version of Ollama.

Please download the latest version at:

    https://ollama.com/download

[2] export IPEX_LLM_MODEL_SOURCE=modelscope ./ollama run gemma3:1b "Hi" --verbose ggml_sycl_init: found 1 SYCL devices: Hey there! How’s your day going? 😊

Is there anything you’d like to chat about, or just want to pass the time? Do you have any questions for me, or would you like to:

Tell me something: A random fact, a story, or whatever pops into your head.
Play a game: We could do something simple like 20 Questions or a word association.
Just chat: We can talk about anything – hobbies, current events, music, etc.

total duration: 2.197152529s load duration: 12.167828ms prompt eval count: 10 token(s) prompt eval duration: 447ms prompt eval rate: 22.37 tokens/s eval count: 115 token(s) eval duration: 1.736s eval rate: 66.24 tokens/s

(base) eapet@eapet-ARL-285:~/ws/ollama-ipex-llm-2.2.0-ubuntu$ ./ollama run gemma3:4b "Hi" --verbose ggml_sycl_init: found 1 SYCL devices: Error: llama runner process has terminated: exit status 2

(base) eapet@eapet-ARL-285:~/ws/ollama-ipex-llm-2.2.0-ubuntu$ ./ollama run gemma3:12b "Hi" --verbose ggml_sycl_init: found 1 SYCL devices: Error: llama runner process has terminated: exit status 2

Apr 09 '25 01:04 chungyehwangai

Hi @tristan-k @chungyehwangai , I will release an initial version to support gemma3, maybe next Monday or Tuesday.

Apr 11 '25 01:04 sgwhat

Hi @tristan-k @chungyehwangai , I will release an initial version to support gemma3, maybe next Monday or Tuesday.

Does the 2.3.0-nightly build add support for Gemma3?

Apr 22 '25 08:04 tristan-k

Does the 2.3.0-nightly build add support for Gemma3?

We have added support for gemma3-fp16.

Apr 22 '25 11:04 sgwhat

I just ran llama-cli -m F:\models\ollama\gemma-3-12b-it-q4_0.gguf -p "Hi" --verbose ... load_tensors: tensor 'token_embd.weight' (f16) (and 0 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead ... llama_kv_cache_init: SYCL0 KV buffer size = 1536.00 MiB llama_init_from_model: KV self size = 1536.00 MiB, K (f16): 768.00 MiB, V (f16): 768.00 MiB llama_init_from_model: SYCL_Host output buffer size = 1.00 MiB ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 62914560 Bytes of memory on device ggml_gallocr_reserve_n: failed to allocate SYCL0 buffer of size 4357881856 ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 62914560 Bytes of memory on device ggml_gallocr_reserve_n: failed to allocate SYCL0 buffer of size 4357881856 llama_init_from_model: failed to allocate compute buffers common_init_from_params: failed to create context with model 'F:\models\ollama\gemma-3-12b-it-q4_0.gguf' main: error: unable to load model

Apr 24 '25 15:04 bli1348

I have seen a similar error several times while running qwen2.5-coder:32b-instruct-q5_K_M.

ollama version is 0.5.4-ipexllm-20250228 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ wonder how to check this version? I run ollama --version, it just shows 0.0.0

Apr 24 '25 15:04 bli1348