Ollama-ipex cannot support Gemma3 4B、Gemma3 12B、 phi4-mini 3.8B,but the official Ollama can.
Please provide support. Ollama-ipex cannot support Gemma3 4B、Gemma3 12B multimodal large model. And phi4-mini 3.8B LLM cannot support, but official Ollama can.
Here it says the currently Ollama Portable Zip is based on Ollama v0.6.2, but with the latest v2.2.0 release, ollama's version is still v0.5.4:
ggml_sycl_init: found 1 SYCL devices:
ollama version is 0.5.4-ipexllm-20250331
See #13184
We have initially completed Ollama multimodal engine support, which should be able to support a series of multimodal models including Gemma 3.
Here it says the currently Ollama Portable Zip is based on Ollama v0.6.2, but with the latest v2.2.0 release, ollama's version is still v0.5.4:
ggml_sycl_init: found 1 SYCL devices: ollama version is 0.5.4-ipexllm-20250331
You may install our latest version via pip install --pre --upgrade ipex-llm[cpp] or download zip here https://www.modelscope.cn/models/Intel/ollama
Here it says the currently Ollama Portable Zip is based on Ollama v0.6.2, but with the latest v2.2.0 release, ollama's version is still v0.5.4: ggml_sycl_init: found 1 SYCL devices: ollama version is 0.5.4-ipexllm-20250331
You may install our latest version via pip install --pre --upgrade ipex-llm[cpp] or download zip here https://www.modelscope.cn/models/Intel/ollama
Thanks for the link! However, even with the new version:
PS D:\Download\ollama-intel-2.3.0b20250516-win> ./ollama run gemma3:12b
Error: llama runner process has terminated: error:index out of range [-1]
goroutine 24 [running]:
github.com/ollama/ollama/ml/backend/ggml.New.func6(...)
D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:169
github.com/ollama/ollama/ml/backend/ggml.New(0xc00012e3b8, {0x6,
Hi, gemma3 is supported from ipex-llm[cpp]==2.3.0b20250529 . phi4-mini is still work in progress.
You could try it again with pip install ipex-llm[cpp]==2.3.0b20250529.
unsure if I am doing something wrong..... going to move to llama.cpp and try my luck
(ipex-llm-test1) thehoff@overmind:~/ipex-ollama/ipex-llm-test1$ pip install ipex-llm[cpp]==2.3.0rc1
Collecting ipex-llm==2.3.0rc1 (from ipex-llm[cpp]==2.3.0rc1) Using cached ipex_llm-2.3.0rc1-py3-none-manylinux2010_x86_64.whl.metadata (8.8 kB) INFO: pip is looking at multiple versions of ipex-llm[cpp] to determine which version is compatible with other requirements. This could take a while. ERROR: Could not find a version that satisfies the requirement bigdl-core-cpp==2.7.0rc1; extra == "cpp" (from ipex-llm[cpp]) (from versions: 2.1.0b2, 2.5.0, 2.6.0b1, 2.6.0b2, 2.6.0b20230911, 2.6.0b20250201, 2.6.0b20250202, 2.6.0b20250203, 2.6.0b20250204, 2.6.0b20250207, 2.6.0b20250209, 2.6.0b20250210, 2.6.0b20250211, 2.6.0b20250212, 2.6.0b20250213, 2.6.0b20250214, 2.6.0b20250215, 2.6.0b20250216, 2.6.0b20250217, 2.6.0b20250218, 2.6.0b20250219, 2.6.0b20250220, 2.6.0b20250221, 2.6.0b20250222, 2.6.0b20250223, 2.6.0b20250224, 2.6.0b20250225, 2.6.0b20250225.post0, 2.6.0b20250226, 2.6.0b20250227, 2.6.0b20250228, 2.6.0b20250230, 2.6.0b20250231, 2.6.0b20250301, 2.6.0b20250302, 2.6.0b20250303, 2.6.0b20250304, 2.6.0b20250305, 2.6.0b20250306, 2.6.0b20250307, 2.6.0b20250308, 2.6.0b20250309, 2.6.0b20250310, 2.6.0b20250311, 2.6.0b20250311.post0, 2.6.0b20250312, 2.6.0b20250312.post0, 2.6.0b20250313, 2.6.0b20250313.post0, 2.6.0b20250314, 2.6.0b20250315, 2.6.0b20250316, 2.6.0b20250317, 2.6.0b20250318, 2.6.0b20250318.post0, 2.6.0b20250319, 2.6.0b20250320, 2.6.0b20250324, 2.6.0b20250325, 2.6.0b20250326, 2.6.0b20250327, 2.6.0b20250328, 2.6.0b20250329, 2.6.0b20250329.post1, 2.6.0b20250329.post2, 2.6.0b20250329.post3, 2.6.0b20250330, 2.6.0b20250331, 2.6.0b20250401, 2.6.0b20250402, 2.6.0b20250403, 2.6.0b20250404, 2.6.0b20250405, 2.6.0b20250406, 2.6.0, 2.7.0b20250407, 2.7.0b20250408, 2.7.0b20250409, 2.7.0b20250410, 2.7.0b20250411, 2.7.0b20250412, 2.7.0b20250413, 2.7.0b20250414, 2.7.0b20250415, 2.7.0b20250416, 2.7.0b20250416.post0, 2.7.0b20250417, 2.7.0b20250418, 2.7.0b20250419, 2.7.0b20250420, 2.7.0b20250421, 2.7.0b20250422, 2.7.0b20250423, 2.7.0b20250425, 2.7.0b20250426, 2.7.0b20250427, 2.7.0b20250428, 2.7.0b20250429, 2.7.0b20250429.post2, 2.7.0b20250430, 2.7.0b20250430.post0, 2.7.0b20250430.post1, 2.7.0b20250501, 2.7.0b20250502, 2.7.0b20250503, 2.7.0b20250504, 2.7.0b20250505, 2.7.0b20250506, 2.7.0b20250507, 2.7.0b20250512, 2.7.0b20250513, 2.7.0b20250514, 2.7.0b20250515, 2.7.0b20250517, 2.7.0b20250518, 2.7.0b20250519, 2.7.0b20250520, 2.7.0b20250521, 2.7.0b20250522, 2.7.0b20250523, 2.7.0b20250524, 2.7.0b20250526, 2.7.0b20250527, 2.7.0b20250528, 2.7.0b20250530, 2.7.0b20250531) ERROR: No matching distribution found for bigdl-core-cpp==2.7.0rc1; extra == "cpp"
I got the same.
On Sun, 1 Jun 2025 at 03:54, Daniel Hoffman @.***> wrote:
thehoff left a comment (intel/ipex-llm#13192) https://github.com/intel/ipex-llm/issues/13192#issuecomment-2926760327
unsure if I am doing something wrong..... going to move to llama.cpp and try my luck
(ipex-llm-test1) @.***:~/ipex-ollama/ipex-llm-test1$ pip install ipex-llm[cpp]==2.3.0rc1
Collecting ipex-llm==2.3.0rc1 (from ipex-llm[cpp]==2.3.0rc1) Using cached ipex_llm-2.3.0rc1-py3-none-manylinux2010_x86_64.whl.metadata (8.8 kB) INFO: pip is looking at multiple versions of ipex-llm[cpp] to determine which version is compatible with other requirements. This could take a while. ERROR: Could not find a version that satisfies the requirement bigdl-core-cpp==2.7.0rc1; extra == "cpp" (from ipex-llm[cpp]) (from versions: 2.1.0b2, 2.5.0, 2.6.0b1, 2.6.0b2, 2.6.0b20230911, 2.6.0b20250201, 2.6.0b20250202, 2.6.0b20250203, 2.6.0b20250204, 2.6.0b20250207, 2.6.0b20250209, 2.6.0b20250210, 2.6.0b20250211, 2.6.0b20250212, 2.6.0b20250213, 2.6.0b20250214, 2.6.0b20250215, 2.6.0b20250216, 2.6.0b20250217, 2.6.0b20250218, 2.6.0b20250219, 2.6.0b20250220, 2.6.0b20250221, 2.6.0b20250222, 2.6.0b20250223, 2.6.0b20250224, 2.6.0b20250225, 2.6.0b20250225.post0, 2.6.0b20250226, 2.6.0b20250227, 2.6.0b20250228, 2.6.0b20250230, 2.6.0b20250231, 2.6.0b20250301, 2.6.0b20250302, 2.6.0b20250303, 2.6.0b20250304, 2.6.0b20250305, 2.6.0b20250306, 2.6.0b20250307, 2.6.0b20250308, 2.6.0b20250309, 2.6.0b20250310, 2.6.0b20250311, 2.6.0b20250311.post0, 2.6.0b20250312, 2.6.0b20250312.post0, 2.6.0b20250313, 2.6.0b20250313.post0, 2.6.0b20250314, 2.6.0b20250315, 2.6.0b20250316, 2.6.0b20250317, 2.6.0b20250318, 2.6.0b20250318.post0, 2.6.0b20250319, 2.6.0b20250320, 2.6.0b20250324, 2.6.0b20250325, 2.6.0b20250326, 2.6.0b20250327, 2.6.0b20250328, 2.6.0b20250329, 2.6.0b20250329.post1, 2.6.0b20250329.post2, 2.6.0b20250329.post3, 2.6.0b20250330, 2.6.0b20250331, 2.6.0b20250401, 2.6.0b20250402, 2.6.0b20250403, 2.6.0b20250404, 2.6.0b20250405, 2.6.0b20250406, 2.6.0, 2.7.0b20250407, 2.7.0b20250408, 2.7.0b20250409, 2.7.0b20250410, 2.7.0b20250411, 2.7.0b20250412, 2.7.0b20250413, 2.7.0b20250414, 2.7.0b20250415, 2.7.0b20250416, 2.7.0b20250416.post0, 2.7.0b20250417, 2.7.0b20250418, 2.7.0b20250419, 2.7.0b20250420, 2.7.0b20250421, 2.7.0b20250422, 2.7.0b20250423, 2.7.0b20250425, 2.7.0b20250426, 2.7.0b20250427, 2.7.0b20250428, 2.7.0b20250429, 2.7.0b20250429.post2, 2.7.0b20250430, 2.7.0b20250430.post0, 2.7.0b20250430.post1, 2.7.0b20250501, 2.7.0b20250502, 2.7.0b20250503, 2.7.0b20250504, 2.7.0b20250505, 2.7.0b20250506, 2.7.0b20250507, 2.7.0b20250512, 2.7.0b20250513, 2.7.0b20250514, 2.7.0b20250515, 2.7.0b20250517, 2.7.0b20250518, 2.7.0b20250519, 2.7.0b20250520, 2.7.0b20250521, 2.7.0b20250522, 2.7.0b20250523, 2.7.0b20250524, 2.7.0b20250526, 2.7.0b20250527, 2.7.0b20250528, 2.7.0b20250530, 2.7.0b20250531) ERROR: No matching distribution found for bigdl-core-cpp==2.7.0rc1; extra == "cpp"
— Reply to this email directly, view it on GitHub https://github.com/intel/ipex-llm/issues/13192#issuecomment-2926760327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWYQMB7YOWXSHZDA4LM3T33BKWTNAVCNFSM6AAAAAB562NUTGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMRWG43DAMZSG4 . You are receiving this because you commented.Message ID: @.***>
Hi @thehoff , 2.3.0rc1 is an older version, please try pip install ipex-llm[cpp]==2.3.0b20250530 .
Hi all,
We have released the Ollama portable zip with Gemma3 vision support. Please download ollama-ipex-llm-2.3.0b20250612-win/ubuntu.zip from the following link: Ollama Portable Zip Release.
To get started, follow the portable zip guide available here: Ollama Portable Zip Quickstart Guide.
Give it a try and let us know your feedback!
I'm having weird issues with ollama-ipex-llm-2.3.0b20250612 running on Intel Battlemage B580 (12GB)
gemma3:12b tries to offload on a larger than CPU ram size. gemma3:4b does not fill the ram , but fills the GPU VRam .
See errors below:
——> ./ollama run gemma3:12b
pulling manifest
pulling e8ad13eff07a... 100% ▕███████████████████████████████████████████████████████████████▏ 8.1 GB
pulling e0a42594d802... 100% ▕███████████████████████████████████████████████████████████████▏ 358 B
pulling dd084c7d92a3... 100% ▕███████████████████████████████████████████████████████████████▏ 8.4 KB
pulling 3116c5225075... 100% ▕███████████████████████████████████████████████████████████████▏ 77 B
pulling 6819964c2bcf... 100% ▕███████████████████████████████████████████████████████████████▏ 490 B
verifying sha256 digest
writing manifest
success
Error: model requires more system memory (226.1 GiB) than is available (106.4 GiB)
This one fills the VRam and crashes ——> ./ollama run gemma3:4b --verbose
hello there can you give me 1000 words on why humans are great Error: POST predict: Post "http://127.0.0.1:35101/completion": EOF
Here are the logs from (start_ollama.sh) ollama serve: ——> ./start-ollama.sh 2025/06/13 09:25:42 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/guido/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]" time=2025-06-13T09:25:42.623-04:00 level=INFO source=images.go:432 msg="total blobs: 0" time=2025-06-13T09:25:42.623-04:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
- using env: export GIN_MODE=release
- using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(Server).ListHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(Server).ShowHandler-fm (6 handlers) time=2025-06-13T09:25:42.624-04:00 level=INFO source=routes.go:1297 msg="Listening on 127.0.0.1:11434" time=2025-06-13T09:25:42.624-04:00 level=INFO source=routes.go:1298 msg="Version intel-ollama-0.6.2" time=2025-06-13T09:25:42.624-04:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-06-13T09:25:42.624-04:00 level=INFO source=gpu.go:218 msg="using Intel GPU" time=2025-06-13T09:25:42.632-04:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-06-13T09:25:42.632-04:00 level=WARN source=amd_linux.go:443 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install" time=2025-06-13T09:25:42.632-04:00 level=WARN source=amd_linux.go:348 msg="unable to verify rocm library: no suitable rocm found, falling back to CPU" time=2025-06-13T09:25:42.632-04:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="59.4 GiB" available="50.1 GiB" [GIN] 2025/06/13 - 09:25:46 | 200 | 50.208µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/06/13 - 09:25:56 | 200 | 17.139µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:25:56 | 200 | 128.137µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/06/13 - 09:27:14 | 200 | 14.369µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:27:14 | 404 | 115.937µs | 127.0.0.1 | POST "/api/show" [GIN] 2025/06/13 - 09:27:14 | 200 | 247.174474ms | 127.0.0.1 | POST "/api/pull" [GIN] 2025/06/13 - 09:27:21 | 200 | 16.449µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:27:21 | 404 | 60.868µs | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:27:21.789-04:00 level=INFO source=download.go:176 msg="downloading e8ad13eff07a in 16 509 MB part(s)" time=2025-06-13T09:28:55.603-04:00 level=INFO source=download.go:294 msg="e8ad13eff07a part 12 attempt 0 failed: unexpected EOF, retrying in 1s" [GIN] 2025/06/13 - 09:29:36 | 200 | 2m15s | 127.0.0.1 | POST "/api/pull" [GIN] 2025/06/13 - 09:29:50 | 200 | 14.36µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:29:50 | 200 | 37.409µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/06/13 - 09:30:02 | 200 | 17.289µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:30:02 | 404 | 64.399µs | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:30:03.275-04:00 level=INFO source=download.go:176 msg="downloading e8ad13eff07a in 16 509 MB part(s)" [GIN] 2025/06/13 - 09:30:25 | 200 | 22.029µs | 127.0.0.1 | GET "/api/version" time=2025-06-13T09:33:09.172-04:00 level=INFO source=download.go:294 msg="e8ad13eff07a part 2 attempt 0 failed: unexpected EOF, retrying in 1s" [GIN] 2025/06/13 - 09:36:05 | 200 | 19.7µs | 127.0.0.1 | GET "/api/version" time=2025-06-13T09:37:03.060-04:00 level=INFO source=download.go:176 msg="downloading e0a42594d802 in 1 358 B part(s)" time=2025-06-13T09:37:04.243-04:00 level=INFO source=download.go:176 msg="downloading dd084c7d92a3 in 1 8.4 KB part(s)" time=2025-06-13T09:37:05.631-04:00 level=INFO source=download.go:176 msg="downloading 3116c5225075 in 1 77 B part(s)" time=2025-06-13T09:37:06.840-04:00 level=INFO source=download.go:176 msg="downloading 6819964c2bcf in 1 490 B part(s)" [GIN] 2025/06/13 - 09:37:11 | 200 | 7m7s | 127.0.0.1 | POST "/api/pull" [GIN] 2025/06/13 - 09:37:11 | 200 | 29.468556ms | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:37:11.974-04:00 level=INFO source=server.go:107 msg="system memory" total="59.4 GiB" free="49.3 GiB" free_swap="57.1 GiB" time=2025-06-13T09:37:11.974-04:00 level=WARN source=server.go:135 msg="model request too large for system" requested="226.1 GiB" available=114262691840 total="59.4 GiB" free="49.3 GiB" swap="57.1 GiB" time=2025-06-13T09:37:11.974-04:00 level=INFO source=sched.go:429 msg="NewLlamaServer failed" model=/home/guido/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de error="model requires more system memory (226.1 GiB) than is available (106.4 GiB)" [GIN] 2025/06/13 - 09:37:11 | 500 | 84.960273ms | 127.0.0.1 | POST "/api/generate" [GIN] 2025/06/13 - 09:38:38 | 200 | 16.04µs | 127.0.0.1 | HEAD "/" [GIN] 2025/06/13 - 09:38:38 | 404 | 149.818µs | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:38:38.515-04:00 level=INFO source=download.go:176 msg="downloading aeda25e63ebd in 16 208 MB part(s)" time=2025-06-13T09:42:10.800-04:00 level=INFO source=download.go:176 msg="downloading b6ae5839783f in 1 489 B part(s)" [GIN] 2025/06/13 - 09:42:13 | 200 | 3m35s | 127.0.0.1 | POST "/api/pull" [GIN] 2025/06/13 - 09:42:13 | 200 | 28.294458ms | 127.0.0.1 | POST "/api/show" time=2025-06-13T09:42:13.583-04:00 level=INFO source=server.go:107 msg="system memory" total="59.4 GiB" free="50.0 GiB" free_swap="57.1 GiB" time=2025-06-13T09:42:13.583-04:00 level=INFO source=server.go:154 msg=offload library=cpu layers.requested=-1 layers.model=35 layers.offload=0 layers.split="" memory.available="[50.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="87.2 GiB" memory.required.partial="0 B" memory.required.kv="68.0 GiB" memory.required.allocations="[49.7 GiB]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.8 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="9.0 GiB" memory.graph.partial="13.0 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-06-13T09:42:13.648-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-06-13T09:42:13.651-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-06-13T09:42:13.653-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+" time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-06-13T09:42:13.657-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-06-13T09:42:13.658-04:00 level=INFO source=server.go:430 msg="starting llama server" cmd="/home/guido/ipex-llm/ollama-ipex-llm-2.3.0b20250612-ubuntu/ollama-bin runner --ollama-engine --model /home/guido/.ollama/models/blobs/sha256-aeda25e63ebd698fab8638ffb778e68bed908b960d39d0becc650fa981609d25 --ctx-size 131072 --batch-size 512 --n-gpu-layers 999 --threads 6 --no-mmap --parallel 4 --port 35101" time=2025-06-13T09:42:13.658-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-06-13T09:42:13.658-04:00 level=INFO source=server.go:605 msg="waiting for llama runner to start responding" time=2025-06-13T09:42:13.658-04:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server error" time=2025-06-13T09:42:13.685-04:00 level=INFO source=runner.go:757 msg="starting ollama engine" time=2025-06-13T09:42:13.685-04:00 level=INFO source=runner.go:817 msg="Server listening on 127.0.0.1:35101" time=2025-06-13T09:42:13.748-04:00 level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-06-13T09:42:13.748-04:00 level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-06-13T09:42:13.748-04:00 level=INFO source=ggml.go:79 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=36 load_backend: loaded SYCL backend from /home/guido/ipex-llm/ollama-ipex-llm-2.3.0b20250612-ubuntu/libggml-sycl.so load_backend: loaded CPU backend from /home/guido/ipex-llm/ollama-ipex-llm-2.3.0b20250612-ubuntu/libggml-cpu-skylakex.so time=2025-06-13T09:42:13.795-04:00 level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Running with Environment Variables: GGML_SYCL_DEBUG: 0 GGML_SYCL_DISABLE_OPT: 1 Build with Macros: GGML_SYCL_FORCE_MMQ: no GGML_SYCL_F16: no Found 1 SYCL devices: | | | | |Max | |Max |Global | | | | | | |compute|Max work|sub |mem | |
| ID | Device Type | Name | Version | units | group | group | size | Driver version |
|---|---|---|---|---|---|---|---|---|
| 0 | [level_zero:gpu:0] | Intel Arc B580 Graphics | 20.1 | 160 | 1024 | 32 | 12168M | 1.6.33578+11 |
| SYCL Optimization Feature: | ||||||||
| ID | Device Type | Reorder | ||||||
| -- | ------------------- | ------- | ||||||
| 0 | [level_zero:gpu:0] | Y | ||||||
| time=2025-06-13T09:42:13.909-04:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server loading model" | ||||||||
| time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:349 msg="Number of model weight buffers" count=2 | ||||||||
| time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:352 msg="model weights" buffer=CPU size="525.0 MiB" | ||||||||
| time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:352 msg="model weights" buffer=SYCL0 size="3.1 GiB" | ||||||||
| time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 | ||||||||
| time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 | ||||||||
| time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=CPU buffer_type=SYCL_Host | ||||||||
| time=2025-06-13T09:42:14.946-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s | 't | 're | 've | 'm | 'll | 'd) | [^\r\n\p{L}\p{N}]?\p{L}+ | \p{N}{1,3} |
| time=2025-06-13T09:42:14.949-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false | ||||||||
| time=2025-06-13T09:42:14.951-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s | 't | 're | 've | 'm | 'll | 'd) | [^\r\n\p{L}\p{N}]?\p{L}+ | \p{N}{1,3} |
| time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 | ||||||||
| time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 | ||||||||
| time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 | ||||||||
| time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 | ||||||||
| time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 | ||||||||
| time=2025-06-13T09:42:15.167-04:00 level=INFO source=server.go:644 msg="llama runner started in 1.51 seconds" | ||||||||
| [GIN] 2025/06/13 - 09:42:15 | 200 | 1.667611544s | 127.0.0.1 | POST "/api/generate" | ||||
| [GIN] 2025/06/13 - 09:45:22 | 200 | 19.928µs | 127.0.0.1 | HEAD "/" | ||||
| [GIN] 2025/06/13 - 09:45:22 | 200 | 27.876264ms | 127.0.0.1 | POST "/api/show" | ||||
| [GIN] 2025/06/13 - 09:45:22 | 200 | 27.719209ms | 127.0.0.1 | POST "/api/generate" | ||||
| Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST) | ||||||||
| Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:428, func:operator() | ||||||||
| SYCL error: CHECK_TRY_ERROR( stream->memset(dest_ptr, static_cast |
||||||||
| in function ggml_backend_sycl_buffer_memset_tensor at /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:428 | ||||||||
| /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:117: SYCL error | ||||||||
| Could not attach to process. If your uid matches the uid of the target | ||||||||
| process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try | ||||||||
| again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf | ||||||||
| ptrace: Operation not permitted. | ||||||||
| No stack. | ||||||||
| The program is not being run. | ||||||||
| SIGABRT: abort | ||||||||
| PC=0x7a4338aa50bc m=46 sigcode=18446744073709551610 | ||||||||
| signal arrived during cgo execution |
The formatting on the text I copied looks messed up. Here are the logs after the card detection:
time=2025-06-13T09:42:13.909-04:00 level=INFO source=server.go:639 msg="waiting for server to become available" status="llm server loading model"
time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:349 msg="Number of model weight buffers" count=2
time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:352 msg="model weights" buffer=CPU size="525.0 MiB"
time=2025-06-13T09:42:13.950-04:00 level=INFO source=ggml.go:352 msg="model weights" buffer=SYCL0 size="3.1 GiB"
time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=SYCL0 buffer_type=SYCL0
time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=SYCL0 buffer_type=SYCL0
time=2025-06-13T09:42:14.945-04:00 level=INFO source=ggml.go:462 msg="compute graph" backend=CPU buffer_type=SYCL_Host
time=2025-06-13T09:42:14.946-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+"
time=2025-06-13T09:42:14.949-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-06-13T09:42:14.951-04:00 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]|\s[\r\n]+|\s+(?!\S)|\s+"
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-06-13T09:42:14.956-04:00 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-06-13T09:42:15.167-04:00 level=INFO source=server.go:644 msg="llama runner started in 1.51 seconds"
[GIN] 2025/06/13 - 09:42:15 | 200 | 1.667611544s | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/06/13 - 09:45:22 | 200 | 19.928µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/06/13 - 09:45:22 | 200 | 27.876264ms | 127.0.0.1 | POST "/api/show"
[GIN] 2025/06/13 - 09:45:22 | 200 | 27.719209ms | 127.0.0.1 | POST "/api/generate"
Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:428, func:operator()
SYCL error: CHECK_TRY_ERROR( stream->memset(dest_ptr, static_cast
well I am not an ollama developer, but I know there is a sycl-ls-device exe in the zip folder, and it works for me to check runable env. I think you can have a try.
Hi all, new portable zip is uploaded here: https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-win.zip / https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-ubuntu.tgz. In this new portable zip, we have upgraded ollama to v0.9.3 . Gemma3 4B、Gemma3 12B 、phi4-mini 3.8B should work now, hope you enjoy it. 😊
@rnwang04 , this is working now. Thanks for all the work on this. One thing, on my intel battlemage B580 12GB , a lot of these smaller models seem to be using most of the GPU memory. For example, Gemma3 4B uses around 10Gb of VRAM. Is anyone else seeing this. It seems , from what I have seen in the past, that the older ollama versions used less VRAM for the ollama models.
edit: after reboot and specifying some of the suggested env variables gemma3:4b model seems to use 5GB VRAM and the gemma3:12b also fits in memory with 11GB VRAM
gemma3:4b -> eval rate: 61.42 tokens/s gemma3:12 -> eval rate: 33.30 tokens/s