ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

What is the current ollama version

Open brownplayer opened this issue 8 months ago • 13 comments

brownplayer avatar Apr 10 '25 01:04 brownplayer

v0.5.4, and we are working on releasing v0.6.2 for linux first.

sgwhat avatar Apr 10 '25 02:04 sgwhat

Will a version supporting v0.6.2 for Linux be released within the next two weeks?

HelloMorningStar avatar Apr 14 '25 04:04 HelloMorningStar

v0.6.2 has been released. You may install it via pip install --pre --upgrade ipex-llm[cpp].

sgwhat avatar Apr 16 '25 07:04 sgwhat

That's confusing. You install ollama by installing --pre ipex-llm?

kirel avatar Apr 16 '25 18:04 kirel

You can get v0.6.2 packages from https://github.com/intel/ipex-llm/releases/tag/v2.3.0-nightly now.

qiuxin2012 avatar Apr 17 '25 01:04 qiuxin2012

Ah. The portable binary. Would building the container via https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/docker_cpp_xpu_quickstart.md also result in this version?

kirel avatar Apr 17 '25 06:04 kirel

@kirel Yes, it's the latest verion we put on pypi, see https://github.com/intel/ipex-llm/blob/73198d5b80dd584de58fc3625ca0cdf78b4f8e42/docker/llm/inference-cpp/Dockerfile#L61

qiuxin2012 avatar Apr 18 '25 02:04 qiuxin2012

(llm-cpp) ubuntu@ubuntu-NUC12SNKi72:~/ollama-ipex-llm-2.3.0b20250415-ubuntu$ ./ollama --version ollama version is 0.0.0

HelloMorningStar avatar Apr 20 '25 14:04 HelloMorningStar

I have an Intel Arc B580 Graphics 12 GIG Vram Driver version 1.6.32567+18

I tried both updating current Ollama install via : pip install --pre --upgrade ipex-llm[cpp]

and also Portable.2.30v.zip with same result with Gemma3

it works with most other LLm's example, Mistral-nemo, deepseek-r1 and phi4 and so on, but when I try Gemma3 it always errors as follows:

ime=2025-04-21T15:28:14.524+10:00 level=INFO source=ggml.go:369 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 time=2025-04-21T15:28:14.524+10:00 level=INFO source=ggml.go:369 msg="compute graph" backend=CPU buffer_type=SYCL_Host time=2025-04-21T15:28:14.525+10:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-04-21T15:28:14.529+10:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-21T15:28:14.532+10:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-21T15:28:14.538+10:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-21T15:28:14.739+10:00 level=INFO source=server.go:628 msg="llama runner started in 2.68 seconds" [GIN] 2025/04/21 - 15:28:14 | 200 | 2.903381703s | 127.0.0.1 | POST "/api/generate" panic: failed to sample token: no tokens to sample from

I have a second server running x2 Tesla p4's that runs Gemma3 with zero issues, Hopefully it will get fixed to work with my Intel card as it is heaps faster than the Tesla P4's.

thank you

coderexpress avatar Apr 21 '25 05:04 coderexpress

I am also seeing the exact same issue with Gemma3 on my Arc A770 using the ollama-portable binary. Other things seem to run fine, but Gemma3. Additionally, I'm unable to load in a model larger than VRAM and have it split between CPU and GPU like I can using mainline Ollama.

afterthought325 avatar Apr 22 '25 13:04 afterthought325

https://github.com/chnxq/ollama/tree/chnxq/add-oneapi

You can try the above one. It uses the latest ollama version,gemma3:12b is work. but it has only been tested on Windows. ref: /llama/README-Intel-OneApi.md

chnxq avatar Apr 22 '25 15:04 chnxq

https://github.com/intel/ipex-llm/issues/13070

Someone has also testit on Linux.

chnxq avatar Apr 22 '25 15:04 chnxq

https://github.com/chnxq/ollama/tree/chnxq/add-oneapi

You can try the above one. It uses the latest ollama version,gemma3:12b is work. but it has only been tested on Windows. ref: /llama/README-Intel-OneApi.md

I will try to install on linux. Thank you for your work

coderexpress avatar Apr 22 '25 22:04 coderexpress