ipex-llm Ollama-ipex cannot support Gemma3 4B、Gemma3 12B、 phi4-mini 3.8B，but the official Ollama can.

Please provide support. Ollama-ipex cannot support Gemma3 4B、Gemma3 12B multimodal large model. And phi4-mini 3.8B LLM cannot support, but official Ollama can.

May 27 '25 03:05 suxm795

Here it says the currently Ollama Portable Zip is based on Ollama v0.6.2, but with the latest v2.2.0 release, ollama's version is still v0.5.4:

ggml_sycl_init: found 1 SYCL devices:
ollama version is 0.5.4-ipexllm-20250331

May 27 '25 14:05 kv9898

See #13184

May 29 '25 00:05 guidov

We have initially completed Ollama multimodal engine support, which should be able to support a series of multimodal models including Gemma 3.

May 29 '25 02:05 cyita

Here it says the currently Ollama Portable Zip is based on Ollama v0.6.2, but with the latest v2.2.0 release, ollama's version is still v0.5.4:

ggml_sycl_init: found 1 SYCL devices: ollama version is 0.5.4-ipexllm-20250331

You may install our latest version via pip install --pre --upgrade ipex-llm[cpp] or download zip here https://www.modelscope.cn/models/Intel/ollama

May 29 '25 02:05 cyita

Here it says the currently Ollama Portable Zip is based on Ollama v0.6.2, but with the latest v2.2.0 release, ollama's version is still v0.5.4: ggml_sycl_init: found 1 SYCL devices: ollama version is 0.5.4-ipexllm-20250331

You may install our latest version via pip install --pre --upgrade ipex-llm[cpp] or download zip here https://www.modelscope.cn/models/Intel/ollama

Thanks for the link! However, even with the new version:

PS D:\Download\ollama-intel-2.3.0b20250516-win> ./ollama run gemma3:12b
Error: llama runner process has terminated: error:index out of range [-1]

goroutine 24 [running]:
github.com/ollama/ollama/ml/backend/ggml.New.func6(...)
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:169
github.com/ollama/ollama/ml/backend/ggml.New(0xc00012e3b8, {0x6,

May 29 '25 03:05 kv9898

Hi, gemma3 is supported from ipex-llm[cpp]==2.3.0b20250529 . phi4-mini is still work in progress. You could try it again with pip install ipex-llm[cpp]==2.3.0b20250529.

May 30 '25 03:05 rnwang04

unsure if I am doing something wrong..... going to move to llama.cpp and try my luck

(ipex-llm-test1) thehoff@overmind:~/ipex-ollama/ipex-llm-test1$ pip install ipex-llm[cpp]==2.3.0rc1

Collecting ipex-llm==2.3.0rc1 (from ipex-llm[cpp]==2.3.0rc1) Using cached ipex_llm-2.3.0rc1-py3-none-manylinux2010_x86_64.whl.metadata (8.8 kB) INFO: pip is looking at multiple versions of ipex-llm[cpp] to determine which version is compatible with other requirements. This could take a while. ERROR: Could not find a version that satisfies the requirement bigdl-core-cpp==2.7.0rc1; extra == "cpp" (from ipex-llm[cpp]) (from versions: 2.1.0b2, 2.5.0, 2.6.0b1, 2.6.0b2, 2.6.0b20230911, 2.6.0b20250201, 2.6.0b20250202, 2.6.0b20250203, 2.6.0b20250204, 2.6.0b20250207, 2.6.0b20250209, 2.6.0b20250210, 2.6.0b20250211, 2.6.0b20250212, 2.6.0b20250213, 2.6.0b20250214, 2.6.0b20250215, 2.6.0b20250216, 2.6.0b20250217, 2.6.0b20250218, 2.6.0b20250219, 2.6.0b20250220, 2.6.0b20250221, 2.6.0b20250222, 2.6.0b20250223, 2.6.0b20250224, 2.6.0b20250225, 2.6.0b20250225.post0, 2.6.0b20250226, 2.6.0b20250227, 2.6.0b20250228, 2.6.0b20250230, 2.6.0b20250231, 2.6.0b20250301, 2.6.0b20250302, 2.6.0b20250303, 2.6.0b20250304, 2.6.0b20250305, 2.6.0b20250306, 2.6.0b20250307, 2.6.0b20250308, 2.6.0b20250309, 2.6.0b20250310, 2.6.0b20250311, 2.6.0b20250311.post0, 2.6.0b20250312, 2.6.0b20250312.post0, 2.6.0b20250313, 2.6.0b20250313.post0, 2.6.0b20250314, 2.6.0b20250315, 2.6.0b20250316, 2.6.0b20250317, 2.6.0b20250318, 2.6.0b20250318.post0, 2.6.0b20250319, 2.6.0b20250320, 2.6.0b20250324, 2.6.0b20250325, 2.6.0b20250326, 2.6.0b20250327, 2.6.0b20250328, 2.6.0b20250329, 2.6.0b20250329.post1, 2.6.0b20250329.post2, 2.6.0b20250329.post3, 2.6.0b20250330, 2.6.0b20250331, 2.6.0b20250401, 2.6.0b20250402, 2.6.0b20250403, 2.6.0b20250404, 2.6.0b20250405, 2.6.0b20250406, 2.6.0, 2.7.0b20250407, 2.7.0b20250408, 2.7.0b20250409, 2.7.0b20250410, 2.7.0b20250411, 2.7.0b20250412, 2.7.0b20250413, 2.7.0b20250414, 2.7.0b20250415, 2.7.0b20250416, 2.7.0b20250416.post0, 2.7.0b20250417, 2.7.0b20250418, 2.7.0b20250419, 2.7.0b20250420, 2.7.0b20250421, 2.7.0b20250422, 2.7.0b20250423, 2.7.0b20250425, 2.7.0b20250426, 2.7.0b20250427, 2.7.0b20250428, 2.7.0b20250429, 2.7.0b20250429.post2, 2.7.0b20250430, 2.7.0b20250430.post0, 2.7.0b20250430.post1, 2.7.0b20250501, 2.7.0b20250502, 2.7.0b20250503, 2.7.0b20250504, 2.7.0b20250505, 2.7.0b20250506, 2.7.0b20250507, 2.7.0b20250512, 2.7.0b20250513, 2.7.0b20250514, 2.7.0b20250515, 2.7.0b20250517, 2.7.0b20250518, 2.7.0b20250519, 2.7.0b20250520, 2.7.0b20250521, 2.7.0b20250522, 2.7.0b20250523, 2.7.0b20250524, 2.7.0b20250526, 2.7.0b20250527, 2.7.0b20250528, 2.7.0b20250530, 2.7.0b20250531) ERROR: No matching distribution found for bigdl-core-cpp==2.7.0rc1; extra == "cpp"

Jun 01 '25 07:06 thehoff

I got the same.

On Sun, 1 Jun 2025 at 03:54, Daniel Hoffman @.***> wrote:

thehoff left a comment (intel/ipex-llm#13192) https://github.com/intel/ipex-llm/issues/13192#issuecomment-2926760327

unsure if I am doing something wrong..... going to move to llama.cpp and try my luck

(ipex-llm-test1) @.***:~/ipex-ollama/ipex-llm-test1$ pip install ipex-llm[cpp]==2.3.0rc1

Collecting ipex-llm==2.3.0rc1 (from ipex-llm[cpp]==2.3.0rc1) Using cached ipex_llm-2.3.0rc1-py3-none-manylinux2010_x86_64.whl.metadata (8.8 kB) INFO: pip is looking at multiple versions of ipex-llm[cpp] to determine which version is compatible with other requirements. This could take a while. ERROR: Could not find a version that satisfies the requirement bigdl-core-cpp==2.7.0rc1; extra == "cpp" (from ipex-llm[cpp]) (from versions: 2.1.0b2, 2.5.0, 2.6.0b1, 2.6.0b2, 2.6.0b20230911, 2.6.0b20250201, 2.6.0b20250202, 2.6.0b20250203, 2.6.0b20250204, 2.6.0b20250207, 2.6.0b20250209, 2.6.0b20250210, 2.6.0b20250211, 2.6.0b20250212, 2.6.0b20250213, 2.6.0b20250214, 2.6.0b20250215, 2.6.0b20250216, 2.6.0b20250217, 2.6.0b20250218, 2.6.0b20250219, 2.6.0b20250220, 2.6.0b20250221, 2.6.0b20250222, 2.6.0b20250223, 2.6.0b20250224, 2.6.0b20250225, 2.6.0b20250225.post0, 2.6.0b20250226, 2.6.0b20250227, 2.6.0b20250228, 2.6.0b20250230, 2.6.0b20250231, 2.6.0b20250301, 2.6.0b20250302, 2.6.0b20250303, 2.6.0b20250304, 2.6.0b20250305, 2.6.0b20250306, 2.6.0b20250307, 2.6.0b20250308, 2.6.0b20250309, 2.6.0b20250310, 2.6.0b20250311, 2.6.0b20250311.post0, 2.6.0b20250312, 2.6.0b20250312.post0, 2.6.0b20250313, 2.6.0b20250313.post0, 2.6.0b20250314, 2.6.0b20250315, 2.6.0b20250316, 2.6.0b20250317, 2.6.0b20250318, 2.6.0b20250318.post0, 2.6.0b20250319, 2.6.0b20250320, 2.6.0b20250324, 2.6.0b20250325, 2.6.0b20250326, 2.6.0b20250327, 2.6.0b20250328, 2.6.0b20250329, 2.6.0b20250329.post1, 2.6.0b20250329.post2, 2.6.0b20250329.post3, 2.6.0b20250330, 2.6.0b20250331, 2.6.0b20250401, 2.6.0b20250402, 2.6.0b20250403, 2.6.0b20250404, 2.6.0b20250405, 2.6.0b20250406, 2.6.0, 2.7.0b20250407, 2.7.0b20250408, 2.7.0b20250409, 2.7.0b20250410, 2.7.0b20250411, 2.7.0b20250412, 2.7.0b20250413, 2.7.0b20250414, 2.7.0b20250415, 2.7.0b20250416, 2.7.0b20250416.post0, 2.7.0b20250417, 2.7.0b20250418, 2.7.0b20250419, 2.7.0b20250420, 2.7.0b20250421, 2.7.0b20250422, 2.7.0b20250423, 2.7.0b20250425, 2.7.0b20250426, 2.7.0b20250427, 2.7.0b20250428, 2.7.0b20250429, 2.7.0b20250429.post2, 2.7.0b20250430, 2.7.0b20250430.post0, 2.7.0b20250430.post1, 2.7.0b20250501, 2.7.0b20250502, 2.7.0b20250503, 2.7.0b20250504, 2.7.0b20250505, 2.7.0b20250506, 2.7.0b20250507, 2.7.0b20250512, 2.7.0b20250513, 2.7.0b20250514, 2.7.0b20250515, 2.7.0b20250517, 2.7.0b20250518, 2.7.0b20250519, 2.7.0b20250520, 2.7.0b20250521, 2.7.0b20250522, 2.7.0b20250523, 2.7.0b20250524, 2.7.0b20250526, 2.7.0b20250527, 2.7.0b20250528, 2.7.0b20250530, 2.7.0b20250531) ERROR: No matching distribution found for bigdl-core-cpp==2.7.0rc1; extra == "cpp"

— Reply to this email directly, view it on GitHub https://github.com/intel/ipex-llm/issues/13192#issuecomment-2926760327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWYQMB7YOWXSHZDA4LM3T33BKWTNAVCNFSM6AAAAAB562NUTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMRWG43DAMZSG4 . You are receiving this because you commented.Message ID: @.***>

Jun 01 '25 15:06 guidov

Hi @thehoff , 2.3.0rc1 is an older version, please try pip install ipex-llm[cpp]==2.3.0b20250530 .

Jun 03 '25 01:06 rnwang04

Hi all,

We have released the Ollama portable zip with Gemma3 vision support. Please download ollama-ipex-llm-2.3.0b20250612-win/ubuntu.zip from the following link: Ollama Portable Zip Release.

To get started, follow the portable zip guide available here: Ollama Portable Zip Quickstart Guide.

Give it a try and let us know your feedback!

Jun 13 '25 11:06 cyita

I'm having weird issues with ollama-ipex-llm-2.3.0b20250612 running on Intel Battlemage B580 (12GB)

gemma3:12b tries to offload on a larger than CPU ram size. gemma3:4b does not fill the ram , but fills the GPU VRam .

See errors below: ——> ./ollama run gemma3:12b pulling manifest pulling e8ad13eff07a... 100% ▕███████████████████████████████████████████████████████████████▏ 8.1 GB
pulling e0a42594d802... 100% ▕███████████████████████████████████████████████████████████████▏ 358 B
pulling dd084c7d92a3... 100% ▕███████████████████████████████████████████████████████████████▏ 8.4 KB
pulling 3116c5225075... 100% ▕███████████████████████████████████████████████████████████████▏ 77 B
pulling 6819964c2bcf... 100% ▕███████████████████████████████████████████████████████████████▏ 490 B
verifying sha256 digest writing manifest success Error: model requires more system memory (226.1 GiB) than is available (106.4 GiB)

This one fills the VRam and crashes ——> ./ollama run gemma3:4b --verbose

hello there can you give me 1000 words on why humans are great Error: POST predict: Post "http://127.0.0.1:35101/completion": EOF

Here are the logs from (start_ollama.sh) ollama serve: ——> ./start-ollama.sh 2025/06/13 09:25:42 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/guido/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]" time=2025-06-13T09:25:42.623-04:00 level=INFO source=images.go:432 msg="total blobs: 0" time=2025-06-13T09:25:42.623-04:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

using env: export GIN_MODE=release
using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(Server).ListHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(Server).ShowHandler-fm (6 handlers) time=2025-06-13T09:25:42.624-04:00 level=INFO source=routes.go:1297 msg="Listening on 127.0.0.1:11434" time=2025-06-13T09:25:42.624-04:00 level=INFO source=routes.go:1298 msg="Version intel-ollama-0.6.2" time=2025-06-13T09:25:42.624-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-06-13T09:25:42.624-04:00 level=INFO source=gpu.go:218 msg="using Intel GPU" time=2025-06-13T09:25:42.632-04:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-06-13T09:25:42.632-04:00 level=WARN source=amd_linux.go:443 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install" time=2025-06-13T09:25:42.632-04:00 level=WARN source=amd_linux.go:348 msg="unable to verify rocm library: no suitable rocm found, falling back to CPU" time=2025-06-13T09:25:42.632-04:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="59.4 GiB" available="50.1 GiB" [GIN] 2025/06/13 - 09:25:46 | 200 | 50.208µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/06/13 - 09:25:56 | 200 | 17.139µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:25:56 | 200 | 128.137µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/06/13 - 09:27:14 | 200 | 14.369µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:27:14 | 404 | 115.937µs | 127.0.0.1 | POST "/api/show" [GIN] 2025/06/13 - 09:27:14 | 200 | 247.174474ms | 127.0.0.1 | POST "/api/pull" [GIN] 2025/06/13 - 09:27:21 | 200 | 16.449µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:27:21 | 404 | 60.868µs | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:27:21.789-04:00 level=INFO source=download.go:176 msg="downloading e8ad13eff07a in 16 509 MB part(s)" time=2025-06-13T09:28:55.603-04:00 level=INFO source=download.go:294 msg="e8ad13eff07a part 12 attempt 0 failed: unexpected EOF, retrying in 1s" [GIN] 2025/06/13 - 09:29:36 | 200 | 2m15s | 127.0.0.1 | POST "/api/pull" [GIN] 2025/06/13 - 09:29:50 | 200 | 14.36µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:29:50 | 200 | 37.409µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/06/13 - 09:30:02 | 200 | 17.289µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:30:02 | 404 | 64.399µs | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:30:03.275-04:00 level=INFO source=download.go:176 msg="downloading e8ad13eff07a in 16 509 MB part(s)" [GIN] 2025/06/13 - 09:30:25 | 200 | 22.029µs | 127.0.0.1 | GET "/api/version" time=2025-06-13T09:33:09.172-04:00 level=INFO source=download.go:294 msg="e8ad13eff07a part 2 attempt 0 failed: unexpected EOF, retrying in 1s" [GIN] 2025/06/13 - 09:36:05 | 200 | 19.7µs | 127.0.0.1 | GET "/api/version" time=2025-06-13T09:37:03.060-04:00 level=INFO source=download.go:176 msg="downloading e0a42594d802 in 1 358 B part(s)" time=2025-06-13T09:37:04.243-04:00 level=INFO source=download.go:176 msg="downloading dd084c7d92a3 in 1 8.4 KB part(s)" time=2025-06-13T09:37:05.631-04:00 level=INFO source=download.go:176 msg="downloading 3116c5225075 in 1 77 B part(s)" time=2025-06-13T09:37:06.840-04:00 level=INFO source=download.go:176 msg="downloading 6819964c2bcf in 1 490 B part(s)" [GIN] 2025/06/13 - 09:37:11 | 200 | 7m7s | 127.0.0.1 | POST "/api/pull" [GIN] 2025/06/13 - 09:37:11 | 200 | 29.468556ms | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:37:11.974-04:00 level=INFO source=server.go:107 msg="system memory" total="59.4 GiB" free="49.3 GiB" free_swap="57.1 GiB" time=2025-06-13T09:37:11.974-04:00 level=WARN source=server.go:135 msg="model request too large for system" requested="226.1 GiB" available=114262691840 total="59.4 GiB" free="49.3 GiB" swap="57.1 GiB" time=2025-06-13T09:37:11.974-04:00 level=INFO source=sched.go:429 msg="NewLlamaServer failed" model=/home/guido/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de error="model requires more system memory (226.1 GiB) than is available (106.4 GiB)" [GIN] 2025/06/13 - 09:37:11 | 500 | 84.960273ms | 127.0.0.1 | POST "/api/generate" [GIN] 2025/06/13 - 09:38:38 | 200 | 16.04µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:38:38 | 404 | 149.818µs | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:38:38.515-04:00 level=INFO source=download.go:176 msg="downloading aeda25e63ebd in 16 208 MB part(s)" time=2025-06-13T09:42:10.800-04:00 level=INFO source=download.go:176 msg="downloading b6ae5839783f in 1 489 B part(s)" [GIN] 2025/06/13 - 09:42:13 | 200 | 3m35s | 127.0.0.1 | POST "/api/pull" [GIN] 2025/06/13 - 09:42:13 | 200 | 28.294458ms | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:42:13.583-04:00 level=INFO source=server.go:107 msg="system memory" total="59.4 GiB" free="50.0 GiB" free_swap="57.1 GiB" time=2025-06-13T09:42:13.583-04:00 level=INFO source=server.go:154 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[50.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="87.2 GiB" memory.required.partial="0 B" memory.required.kv="68.0 GiB" memory.required.allocations="[49.7 GiB]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="9.0 GiB" memory.graph.partial="13.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-06-13T09:42:13.648-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-06-13T09:42:13.651-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-06-13T09:42:13.653-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-06-13T09:42:13.658-04:00 level=INFO source=server.go:430 msg="starting llama server" cmd="/home/guido/ipex-llm/ollama-ipex-llm-2.3.0b20250612-ubuntu/ollama-bin runner --ollama-engine --model /home/guido/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 131072 --batch-size 512 --n-gpu-layers 999 --threads 6 --no-mmap --parallel 4 --port 35101" time=2025-06-13T09:42:13.658-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-06-13T09:42:13.658-04:00 level=INFO source=server.go:605 msg="waiting for llama runner to start responding" time=2025-06-13T09:42:13.658-04:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server error" time=2025-06-13T09:42:13.685-04:00 level=INFO source=runner.go:757 msg="starting ollama engine" time=2025-06-13T09:42:13.685-04:00 level=INFO source=runner.go:817 msg="Server listening on 127.0.0.1:35101" time=2025-06-13T09:42:13.748-04:00 level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-06-13T09:42:13.748-04:00 level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-06-13T09:42:13.748-04:00 level=INFO source=ggml.go:79 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded SYCL backend from /home/guido/ipex-llm/ollama-ipex-llm-2.3.0b20250612-ubuntu/libggml-sycl.so load_backend: loaded CPU backend from /home/guido/ipex-llm/ollama-ipex-llm-2.3.0b20250612-ubuntu/libggml-cpu-skylakex.so time=2025-06-13T09:42:13.795-04:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Running with Environment Variables: GGML_SYCL_DEBUG: 0 GGML_SYCL_DISABLE_OPT: 1 Build with Macros: GGML_SYCL_FORCE_MMQ: no GGML_SYCL_F16: no Found 1 SYCL devices: | | | | |Max | |Max |Global | | | | | | |compute|Max work|sub |mem | |

ID	Device Type	Name	Version	units	group	group	size	Driver version
0	[level_zero:gpu:0]	Intel Arc B580 Graphics	20.1	160	1024	32	12168M	1.6.33578+11
SYCL Optimization Feature:
ID	Device Type	Reorder
--	-------------------	-------
0	[level_zero:gpu:0]	Y
time=2025-06-13T09:42:13.909-04:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server loading model"
time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:349 msg="Number of model weight buffers" count=2
time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:352 msg="model weights" buffer=CPU size="525.0 MiB"
time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:352 msg="model weights" buffer=SYCL0 size="3.1 GiB"
time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=SYCL0 buffer_type=SYCL0
time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=SYCL0 buffer_type=SYCL0
time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=CPU buffer_type=SYCL_Host
time=2025-06-13T09:42:14.946-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s	't	're	've	'm	'll	'd)	[^\r\n\p{L}\p{N}]?\p{L}+	\p{N}{1,3}
time=2025-06-13T09:42:14.949-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-06-13T09:42:14.951-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s	't	're	've	'm	'll	'd)	[^\r\n\p{L}\p{N}]?\p{L}+	\p{N}{1,3}
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-06-13T09:42:15.167-04:00 level=INFO source=server.go:644 msg="llama runner started in 1.51 seconds"
[GIN] 2025/06/13 - 09:42:15	200	1.667611544s	127.0.0.1	POST "/api/generate"
[GIN] 2025/06/13 - 09:45:22	200	19.928µs	127.0.0.1	HEAD "/"
[GIN] 2025/06/13 - 09:45:22	200	27.876264ms	127.0.0.1	POST "/api/show"
[GIN] 2025/06/13 - 09:45:22	200	27.719209ms	127.0.0.1	POST "/api/generate"
Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:428, func:operator()
SYCL error: CHECK_TRY_ERROR( stream->memset(dest_ptr, static_cast(value), size) .wait() ): Meet error in this line code!
in function ggml_backend_sycl_buffer_memset_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:428
/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:117: SYCL error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
SIGABRT: abort
PC=0x7a4338aa50bc m=46 sigcode=18446744073709551610
signal arrived during cgo execution

Jun 13 '25 14:06 guidov

The formatting on the text I copied looks messed up. Here are the logs after the card detection:

time=2025-06-13T09:42:13.909-04:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server loading model" time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:349 msg="Number of model weight buffers" count=2 time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:352 msg="model weights" buffer=CPU size="525.0 MiB" time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:352 msg="model weights" buffer=SYCL0 size="3.1 GiB" time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=CPU buffer_type=SYCL_Host time=2025-06-13T09:42:14.946-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-06-13T09:42:14.949-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-06-13T09:42:14.951-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-06-13T09:42:15.167-04:00 level=INFO source=server.go:644 msg="llama runner started in 1.51 seconds" [GIN] 2025/06/13 - 09:42:15 | 200 | 1.667611544s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/06/13 - 09:45:22 | 200 | 19.928µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:45:22 | 200 | 27.876264ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/06/13 - 09:45:22 | 200 | 27.719209ms | 127.0.0.1 | POST "/api/generate" Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST) Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:428, func:operator() SYCL error: CHECK_TRY_ERROR( stream->memset(dest_ptr, static_cast(value), size) .wait() ): Meet error in this line code! in function ggml_backend_sycl_buffer_memset_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:428 /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:117: SYCL error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. SIGABRT: abort PC=0x7a4338aa50bc m=46 sigcode=18446744073709551610 signal arrived during cgo execution

Jun 13 '25 14:06 guidov

well I am not an ollama developer, but I know there is a sycl-ls-device exe in the zip folder, and it works for me to check runable env. I think you can have a try.

Jun 16 '25 01:06 Ellie-Williams-007

Hi all, new portable zip is uploaded here: https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-win.zip / https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-ubuntu.tgz. In this new portable zip, we have upgraded ollama to v0.9.3 . Gemma3 4B、Gemma3 12B 、phi4-mini 3.8B should work now, hope you enjoy it. 😊

Jul 02 '25 01:07 rnwang04

@rnwang04 , this is working now. Thanks for all the work on this. One thing, on my intel battlemage B580 12GB , a lot of these smaller models seem to be using most of the GPU memory. For example, Gemma3 4B uses around 10Gb of VRAM. Is anyone else seeing this. It seems , from what I have seen in the past, that the older ollama versions used less VRAM for the ollama models.

edit: after reboot and specifying some of the suggested env variables gemma3:4b model seems to use 5GB VRAM and the gemma3:12b also fits in memory with 11GB VRAM

gemma3:4b -> eval rate: 61.42 tokens/s gemma3:12 -> eval rate: 33.30 tokens/s

Jul 02 '25 17:07 guidov