What is the current ollama version
v0.5.4, and we are working on releasing v0.6.2 for linux first.
Will a version supporting v0.6.2 for Linux be released within the next two weeks?
v0.6.2 has been released. You may install it via pip install --pre --upgrade ipex-llm[cpp].
That's confusing. You install ollama by installing --pre ipex-llm?
You can get v0.6.2 packages from https://github.com/intel/ipex-llm/releases/tag/v2.3.0-nightly now.
Ah. The portable binary. Would building the container via https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/docker_cpp_xpu_quickstart.md also result in this version?
@kirel Yes, it's the latest verion we put on pypi, see https://github.com/intel/ipex-llm/blob/73198d5b80dd584de58fc3625ca0cdf78b4f8e42/docker/llm/inference-cpp/Dockerfile#L61
(llm-cpp) ubuntu@ubuntu-NUC12SNKi72:~/ollama-ipex-llm-2.3.0b20250415-ubuntu$ ./ollama --version ollama version is 0.0.0
I have an Intel Arc B580 Graphics 12 GIG Vram Driver version 1.6.32567+18
I tried both updating current Ollama install via : pip install --pre --upgrade ipex-llm[cpp]
and also Portable.2.30v.zip with same result with Gemma3
it works with most other LLm's example, Mistral-nemo, deepseek-r1 and phi4 and so on, but when I try Gemma3 it always errors as follows:
ime=2025-04-21T15:28:14.524+10:00 level=INFO source=ggml.go:369 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 time=2025-04-21T15:28:14.524+10:00 level=INFO source=ggml.go:369 msg="compute graph" backend=CPU buffer_type=SYCL_Host time=2025-04-21T15:28:14.525+10:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-04-21T15:28:14.529+10:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-21T15:28:14.532+10:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-21T15:28:14.739+10:00 level=INFO source=server.go:628 msg="llama runner started in 2.68 seconds" [GIN] 2025/04/21 - 15:28:14 | 200 | 2.903381703s | 127.0.0.1 | POST "/api/generate" panic: failed to sample token: no tokens to sample from
I have a second server running x2 Tesla p4's that runs Gemma3 with zero issues, Hopefully it will get fixed to work with my Intel card as it is heaps faster than the Tesla P4's.
thank you
I am also seeing the exact same issue with Gemma3 on my Arc A770 using the ollama-portable binary. Other things seem to run fine, but Gemma3. Additionally, I'm unable to load in a model larger than VRAM and have it split between CPU and GPU like I can using mainline Ollama.
https://github.com/chnxq/ollama/tree/chnxq/add-oneapi
You can try the above one. It uses the latest ollama version,gemma3:12b is work. but it has only been tested on Windows. ref: /llama/README-Intel-OneApi.md
https://github.com/intel/ipex-llm/issues/13070
Someone has also testit on Linux.
https://github.com/chnxq/ollama/tree/chnxq/add-oneapi
You can try the above one. It uses the latest ollama version,gemma3:12b is work. but it has only been tested on Windows. ref: /llama/README-Intel-OneApi.md
I will try to install on linux. Thank you for your work