ipex-llm gemma3n unable to run ipex-llm

Please can you let me know if we can run ipex-llm with gemma3n

Jun 30 '25 14:06 shailesh837

Hi @shailesh837 , gemma3n is supported from ipex-llm[cpp]==2.3.0b20250630. You could try it first with pip install --pre --upgrade ipex-llm[cpp] or waiting for new ollama portable zip, we will release it recently.

Jul 01 '25 01:07 rnwang04

Hi @shailesh837 , new portable zip is uploaded here: https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-win.zip , and in this new portable zip, you can run with gemma3n.

Jul 02 '25 01:07 rnwang04

Hi @shailesh837 , new portable zip is uploaded here: https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-win.zip , and in this new portable zip, you can run with gemma3n.

Hello, not working with portable (2.3.0b20250630-win) or installed ipex-llm[cpp] (conda env)

model: gemma3n:latest 15cb39fd9394 7.5 GB (gemma3n:e4b-it-q4_K_M

And here is some bug fixes im main ollama for gemma3n: https://github.com/ollama/ollama/compare/v0.9.3...v0.9.4 (convert/convert_gemma3n.go) log:

C:\llm\llama-cpp\2.3.0b20250630>ollama serve
time=2025-07-02T11:36:28.640+03:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\uuk\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-07-02T11:36:28.702+03:00 level=INFO source=images.go:476 msg="total blobs: 11"
time=2025-07-02T11:36:28.704+03:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-07-02T11:36:28.737+03:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)"
time=2025-07-02T11:36:28.738+03:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-02T11:36:28.742+03:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
time=2025-07-02T11:36:28.743+03:00 level=INFO source=gpu_windows.go:167 msg=packages count=2
time=2025-07-02T11:36:28.744+03:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=18 efficiency=0 threads=36
time=2025-07-02T11:36:28.745+03:00 level=INFO source=gpu_windows.go:214 msg="" package=1 cores=18 efficiency=0 threads=36
time=2025-07-02T11:36:28.774+03:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="127.9 GiB" available="116.0 GiB"
[GIN] 2025/07/02 - 11:36:58 | 200 |       503.4µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/07/02 - 11:36:58 | 200 |      7.0101ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/07/02 - 11:37:07 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2025/07/02 - 11:37:07 | 200 |    290.2304ms |       127.0.0.1 | POST     "/api/show"
time=2025-07-02T11:37:08.381+03:00 level=INFO source=server.go:135 msg="system memory" total="127.9 GiB" free="116.0 GiB" free_swap="122.1 GiB"
time=2025-07-02T11:37:08.383+03:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=36 layers.offload=0 layers.split="" memory.available="[116.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.0 GiB" memory.required.partial="0 B" memory.required.kv="280.0 MiB" memory.required.allocations="[5.0 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.2 GiB" memory.weights.nonrepeating="420.4 MiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB"
time=2025-07-02T11:37:08.531+03:00 level=INFO source=server.go:458 msg="starting llama server" cmd="C:\\llm\\llama-cpp\\2.3.0b20250630\\ollama-lib.exe runner --ollama-engine --model C:\\Users\\uuk\\.ollama\\models\\blobs\\sha256-38e8dcc30df4eb0e29eaf5c74ba6ce3f2cd66badad50768fc14362acfb8b8cb6 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 36 --no-mmap --parallel 2 --port 54345"
time=2025-07-02T11:37:08.534+03:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-07-02T11:37:08.535+03:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-07-02T11:37:08.537+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server error"
time=2025-07-02T11:37:08.668+03:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-07-02T11:37:08.713+03:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:54345"
time=2025-07-02T11:37:08.789+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
time=2025-07-02T11:37:08.869+03:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=Q4_K_M name="" description="" num_tensors=847 num_key_values=40
time=2025-07-02T11:37:08.877+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(clang)
Running with Environment Variables:
  GGML_SYCL_DEBUG: 0
  GGML_SYCL_DISABLE_OPT: 1
  GGML_SYCL_DISABLE_GRAPH: 1
  GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
  GGML_SYCL_FORCE_MMQ: no
  GGML_SYCL_F16: no
Found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16704M|            1.6.33511|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 402653184 Bytes of memory on device
alloc_tensor_range: failed to allocate SYCL0 buffer of size 4697620480
panic: insufficient memory - required allocations: {InputWeights:440832000A CPU:{Name:CPU UUID: Weights:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U} GPUs:[{Name:SYCL0 UUID: Weights:[72269184F 72269184F 72269184F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 72269184F 72269184F 72269184F 72269184F 5162939392F] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U}]}

goroutine 52 [running]:
github.com/ollama/ollama/ml/backend/ggml.New({0xc000038150, 0x69}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:380 +0x3116
github.com/ollama/ollama/ml.NewBackend({0xc000038150, 0x69}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend.go:209 +0xb1
github.com/ollama/ollama/model.New({0xc000038150?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/model/model.go:102 +0x8f
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc00051c000, {0xc000038150?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...)
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:841 +0x8d
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00051c000, {0x7ff7df78c9d0, 0xc000194af0}, {0xc000038150?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...)
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11
time=2025-07-02T11:37:12.052+03:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/07/02 - 11:37:12 | 500 |    4.1449975s |       127.0.0.1 | POST     "/api/generate"

Jul 02 '25 08:07 savvadesogle

@rnwang04 : Please can you test it and which iGPU and dGPU combination i can suggest to our EMEA customer

Jul 02 '25 09:07 shailesh837

When “ollama run gemma3n:e4b“ in [ollama-ipex-llm-2.3.0b20250630]， I got： Error: llama runner process has terminated: exit status 2

Jul 02 '25 09:07 yizhangliu

Hi @shailesh837 , new portable zip is uploaded here: https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-win.zip , and in this new portable zip, you can run with gemma3n.

what is the more commanded version, the nightly release or the docker container ? I tried gemma3n with the latest docker container and it doesn't work

Jul 02 '25 10:07 FilipLaurentiu

what is the more commanded version, the nightly release or the docker container ? I tried gemma3n with the latest docker container and it doesn't work

Hi @FilipLaurentiu , we recommend to install the latest portable zip (https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-win.zip / https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-ubuntu.tgz). You could try it again with it.

Jul 02 '25 13:07 rnwang04

When “ollama run gemma3n:e4b“ in [ollama-ipex-llm-2.3.0b20250630]， I got： Error: llama runner process has terminated: exit status 2

Hi @yizhangliu , could you please provide us with detailed server log ? On my side, I can run gemma3n:e4b successfully.

arda@arda-arc:~/ollama-ipex-llm-2.3.0b20250630-ubuntu$ ./ollama run gemma3n:e4b
>>> hi
Hi there! 👋 

How can I help you today?  I'm ready for anything - questions, stories, brainstorming, or just a friendly chat. 😊 

Let me know what's on your mind!

Jul 02 '25 13:07 rnwang04

Hi @savvadesogle , I didn't reproduce the error on my Linux machine, on my side, gemma3n:e4b / gemma3n:e2b both works with latest ollama portable zip, and based on my observation, gemma3n:e4b costs about 8G GPU memory. Below is my log:

time=2025-07-02T17:42:43.638+08:00 level=INFO source=server.go:135 msg="system memory" total="62.6 GiB" free="50.8 GiB" free_swap="7.9 GiB"
time=2025-07-02T17:42:43.638+08:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=36 layers.offload=0 layers.split="" memory.available="[50.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.0 GiB" memory.required.partial="0 B" memory.required.kv="280.0 MiB" memory.required.allocations="[5.0 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.2 GiB" memory.weights.nonrepeating="420.4 MiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB"
time=2025-07-02T17:42:43.671+08:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/home/arda/ruonan/ollama-ipex-llm-2.3.0b20250630-ubuntu/ollama-bin runner --ollama-engine --model /home/arda/.ollama/models/blobs/sha256-38e8dcc30df4eb0e29eaf5c74ba6ce3f2cd66badad50768fc14362acfb8b8cb6 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 8 --no-mmap --parallel 2 --port 46637"
time=2025-07-02T17:42:43.672+08:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-07-02T17:42:43.672+08:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-07-02T17:42:43.672+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
time=2025-07-02T17:42:43.696+08:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-07-02T17:42:43.696+08:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:46637"
time=2025-07-02T17:42:43.731+08:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=Q4_K_M name="" description="" num_tensors=847 num_key_values=40
load_backend: loaded SYCL backend from /home/arda/ruonan/ollama-ipex-llm-2.3.0b20250630-ubuntu/libggml-sycl.so
load_backend: loaded CPU backend from /home/arda/ruonan/ollama-ipex-llm-2.3.0b20250630-ubuntu/libggml-cpu-alderlake.so
time=2025-07-02T17:42:43.757+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Running with Environment Variables:
  GGML_SYCL_DEBUG: 0
  GGML_SYCL_DISABLE_OPT: 1
  GGML_SYCL_DISABLE_GRAPH: 1
  GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
  GGML_SYCL_FORCE_MMQ: no
  GGML_SYCL_F16: no
Found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16704M|            1.6.33511|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
time=2025-07-02T17:42:43.839+08:00 level=INFO source=ggml.go:388 msg="model weights" buffer=SYCL0 size="7.0 GiB"
time=2025-07-02T17:42:43.839+08:00 level=INFO source=ggml.go:388 msg="model weights" buffer=CPU size="420.4 MiB"
time=2025-07-02T17:42:43.839+08:00 level=INFO source=ggml.go:417 msg="Sycl device count" count=1
time=2025-07-02T17:42:43.854+08:00 level=INFO source=ggml.go:710 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 size="201.7 MiB"
time=2025-07-02T17:42:43.854+08:00 level=INFO source=ggml.go:710 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 size="0 B"
time=2025-07-02T17:42:43.854+08:00 level=INFO source=ggml.go:710 msg="compute graph" backend=CPU buffer_type=CPU size="1.4 GiB"
time=2025-07-02T17:42:43.923+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
time=2025-07-02T17:42:47.438+08:00 level=INFO source=server.go:657 msg="llama runner started in 3.77 seconds"
[GIN] 2025/07/02 - 17:42:47 | 200 |  3.899137611s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2025/07/02 - 17:42:51 | 200 |  1.659758203s |       127.0.0.1 | POST     "/api/chat"

Could you please have a double check at your side, esp. your GPU memory usage ?

Jul 02 '25 13:07 rnwang04

your GPU memory usage ?

What did you mean?

i download and tried it again.. not working. (Windows 11, ollama version is 0.9.3, ollama-ipex-llm-2.3.0b20250630-win.zip)

C:\test>ollama serve
time=2025-07-02T13:49:08.063+03:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\uuk\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-07-02T13:49:08.103+03:00 level=INFO source=images.go:476 msg="total blobs: 11"
time=2025-07-02T13:49:08.104+03:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-07-02T13:49:08.133+03:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)"
time=2025-07-02T13:49:08.133+03:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-02T13:49:08.134+03:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
time=2025-07-02T13:49:08.135+03:00 level=INFO source=gpu_windows.go:167 msg=packages count=2
time=2025-07-02T13:49:08.135+03:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=18 efficiency=0 threads=36
time=2025-07-02T13:49:08.136+03:00 level=INFO source=gpu_windows.go:214 msg="" package=1 cores=18 efficiency=0 threads=36
time=2025-07-02T13:49:08.154+03:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="127.9 GiB" available="114.3 GiB"
[GIN] 2025/07/02 - 13:49:12 | 200 |       499.2µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/07/02 - 13:49:12 | 200 |     301.541ms |       127.0.0.1 | POST     "/api/show"
time=2025-07-02T13:49:12.893+03:00 level=INFO source=server.go:135 msg="system memory" total="127.9 GiB" free="114.2 GiB" free_swap="119.9 GiB"
time=2025-07-02T13:49:12.895+03:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=36 layers.offload=0 layers.split="" memory.available="[114.2 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.0 GiB" memory.required.partial="0 B" memory.required.kv="280.0 MiB" memory.required.allocations="[5.0 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.2 GiB" memory.weights.nonrepeating="420.4 MiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB"
time=2025-07-02T13:49:13.063+03:00 level=INFO source=server.go:458 msg="starting llama server" cmd="C:\\test\\ollama-lib.exe runner --ollama-engine --model C:\\Users\\uuk\\.ollama\\models\\blobs\\sha256-38e8dcc30df4eb0e29eaf5c74ba6ce3f2cd66badad50768fc14362acfb8b8cb6 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 36 --no-mmap --parallel 2 --port 54049"
time=2025-07-02T13:49:13.066+03:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-07-02T13:49:13.067+03:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-07-02T13:49:13.068+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server error"
time=2025-07-02T13:49:13.190+03:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-07-02T13:49:13.227+03:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:54049"
time=2025-07-02T13:49:13.322+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
time=2025-07-02T13:49:13.538+03:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=Q4_K_M name="" description="" num_tensors=847 num_key_values=40
time=2025-07-02T13:49:13.553+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(clang)
Running with Environment Variables:
  GGML_SYCL_DEBUG: 0
  GGML_SYCL_DISABLE_OPT: 1
  GGML_SYCL_DISABLE_GRAPH: 1
  GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
  GGML_SYCL_FORCE_MMQ: no
  GGML_SYCL_F16: no
Found 4 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16704M|            1.6.33511|
| 1| [level_zero:gpu:1]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16704M|            1.6.33511|
| 2| [level_zero:gpu:2]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16704M|            1.6.33511|
| 3| [level_zero:gpu:3]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16704M|            1.6.33511|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
| 1| [level_zero:gpu:1]|      Y|
| 2| [level_zero:gpu:2]|      Y|
| 3| [level_zero:gpu:3]|      Y|
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 402653184 Bytes of memory on device
alloc_tensor_range: failed to allocate SYCL0 buffer of size 4697620480
panic: insufficient memory - required allocations: {InputWeights:440832000A CPU:{Name:CPU UUID: Weights:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U} GPUs:[{Name:SYCL0 UUID: Weights:[72269184F 72269184F 72269184F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 72269184F 72269184F 72269184F 72269184F 5162939392F] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U}]}

goroutine 66 [running]:
github.com/ollama/ollama/ml/backend/ggml.New({0xc000038150, 0x69}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:380 +0x3116
github.com/ollama/ollama/ml.NewBackend({0xc000038150, 0x69}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend.go:209 +0xb1
github.com/ollama/ollama/model.New({0xc000038150?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/model/model.go:102 +0x8f
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc00004c120, {0xc000038150?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...)
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:841 +0x8d
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc00004c120, {0x7ff642d7c9d0, 0xc000192af0}, {0xc000038150?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...)
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        D:/actions-runner/release-cpp-oneapi_2024_2/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11
time=2025-07-02T13:49:17.340+03:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/07/02 - 13:49:17 | 500 |    4.9278679s |       127.0.0.1 | POST     "/api/generate"

Jul 02 '25 13:07 savvadesogle

what is the more commanded version, the nightly release or the docker container ? I tried gemma3n with the latest docker container and it doesn't work

Hi @FilipLaurentiu , we recommend to install the latest portable zip (https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-win.zip / https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250630-ubuntu.tgz). You could try it again with it.

ollama-ipex-llm-2.3.0b20250630-ubuntu ./start-ollama.sh
time=2025-07-02T17:01:39.402+03:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/filip/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]"
time=2025-07-02T17:01:39.402+03:00 level=INFO source=images.go:476 msg="total blobs: 23"
time=2025-07-02T17:01:39.403+03:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-07-02T17:01:39.403+03:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)"
time=2025-07-02T17:01:39.403+03:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-02T17:01:39.403+03:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
time=2025-07-02T17:01:39.431+03:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="62.3 GiB" available="55.2 GiB"
[GIN] 2025/07/02 - 17:02:06 | 200 |      59.964µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/07/02 - 17:02:06 | 404 |     475.631µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/07/02 - 17:02:06 | 200 |  671.350852ms |       127.0.0.1 | POST     "/api/pull"
[GIN] 2025/07/02 - 17:02:11 | 200 |      45.818µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/07/02 - 17:02:11 | 200 |     928.085µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/07/02 - 17:02:24 | 200 |      31.246µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/07/02 - 17:02:25 | 200 |  112.861202ms |       127.0.0.1 | POST     "/api/show"
time=2025-07-02T17:02:25.237+03:00 level=INFO source=server.go:135 msg="system memory" total="62.3 GiB" free="55.6 GiB" free_swap="4.0 GiB"
time=2025-07-02T17:02:25.239+03:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=36 layers.offload=0 layers.split="" memory.available="[55.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.8 GiB" memory.required.partial="0 B" memory.required.kv="280.0 MiB" memory.required.allocations="[10.8 GiB]" memory.weights.total="8.3 GiB" memory.weights.repeating="7.3 GiB" memory.weights.nonrepeating="1.0 GiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB"
time=2025-07-02T17:02:25.309+03:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/home/filip/Downloads/ollama-ipex-llm-2.3.0b20250630-ubuntu/ollama-bin runner --ollama-engine --model /home/filip/.ollama/models/blobs/sha256-e4e9439b8ee5aa6213d487a6780c0618fe971e0582e11b7cc3741a1230205644 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 8 --no-mmap --parallel 2 --port 40237"
time=2025-07-02T17:02:25.310+03:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-07-02T17:02:25.310+03:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-07-02T17:02:25.311+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
time=2025-07-02T17:02:25.365+03:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-07-02T17:02:25.365+03:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:40237"
time=2025-07-02T17:02:25.427+03:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=F16 name="" description="" num_tensors=847 num_key_values=40
SIGSEGV: segmentation violation
PC=0x7f651452276f m=7 sigcode=1 addr=0x8
signal arrived during cgo execution

goroutine 13 gp=0xc000103dc0 m=7 mp=0xc000580008 [syscall]:
runtime.cgocall(0x1168450, 0xc000050d58)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/cgocall.go:167 +0x4b fp=0xc000050d30 sp=0xc000050cf8 pc=0x4977ab
github.com/ollama/ollama/ml/backend/ggml/ggml/src._Cfunc_ggml_backend_load_all_from_path(0x7f64b0000d30)
	_cgo_gotypes.go:195 +0x3a fp=0xc000050d58 sp=0xc000050d30 pc=0x84301a
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.func1.1({0xc0000400b4, 0x3b})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml/src/ggml.go:97 +0xf5 fp=0xc000050df0 sp=0xc000050d58 pc=0x842ab5
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.func1()
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml/src/ggml.go:98 +0x526 fp=0xc000051080 sp=0xc000050df0 pc=0x842906
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.OnceFunc.func2()
	/root/go/pkg/mod/golang.org/[email protected]/src/sync/oncefunc.go:27 +0x62 fp=0xc0000510c8 sp=0xc000051080 pc=0x842302
sync.(*Once).doSlow(0xc000051150?, 0x498739?)
	/root/go/pkg/mod/golang.org/[email protected]/src/sync/once.go:78 +0xab fp=0xc000051120 sp=0xc0000510c8 pc=0x4ac78b
sync.(*Once).Do(0xc0004c21b0?, 0xc000051178?)
	/root/go/pkg/mod/golang.org/[email protected]/src/sync/once.go:69 +0x19 fp=0xc000051140 sp=0xc000051120 pc=0x4ac6b9
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.OnceFunc.func3()
	/root/go/pkg/mod/golang.org/[email protected]/src/sync/oncefunc.go:32 +0x2d fp=0xc000051170 sp=0xc000051140 pc=0x84226d
github.com/ollama/ollama/ml/backend/ggml.devices()
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:41 +0x1e fp=0xc0000511b0 sp=0xc000051170 pc=0x8d63de
github.com/ollama/ollama/ml/backend/ggml.New({0x7ffebeb84e11, 0x68}, {0x8, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:115 +0x537 fp=0xc000051c18 sp=0xc0000511b0 pc=0x8d69b7
github.com/ollama/ollama/ml.NewBackend({0x7ffebeb84e11, 0x68}, {0x8, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend.go:209 +0xb1 fp=0xc000051c70 sp=0xc000051c18 pc=0x8c9311
github.com/ollama/ollama/model.New({0x7ffebeb84e11?, 0x0?}, {0x8, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/model/model.go:102 +0x8f fp=0xc000051d68 sp=0xc000051c70 pc=0x8e706f
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc0001339e0, {0x7ffebeb84e11?, 0x0?}, {0x8, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...)
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:841 +0x8d fp=0xc000051dc8 sp=0xc000051d68 pc=0x9895ad
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc0001339e0, {0x15f3a90, 0xc000530a50}, {0x7ffebeb84e11?, 0x0?}, {0x8, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...)
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8 fp=0xc000051f20 sp=0xc000051dc8 pc=0x989918
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:959 +0xc7 fp=0xc000051fe0 sp=0xc000051f20 pc=0x98ad47
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000051fe8 sp=0xc000051fe0 pc=0x4a2281
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000687650 sp=0xc000687630 pc=0x49ac2e
runtime.netpollblock(0x4989b3?, 0x433d46?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:575 +0xf7 fp=0xc000687688 sp=0xc000687650 pc=0x45f8f7
internal/poll.runtime_pollWait(0x7f652a8c7eb0, 0x72)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:351 +0x85 fp=0xc0006876a8 sp=0xc000687688 pc=0x499e45
internal/poll.(*pollDesc).wait(0xc000496d00?, 0x380016?, 0x0)
	/root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0006876d0 sp=0xc0006876a8 pc=0x521167
internal/poll.(*pollDesc).waitRead(...)
	/root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000496d00)
	/root/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_unix.go:620 +0x295 fp=0xc000687778 sp=0xc0006876d0 pc=0x526535
net.(*netFD).accept(0xc000496d00)
	/root/go/pkg/mod/golang.org/[email protected]/src/net/fd_unix.go:172 +0x29 fp=0xc000687830 sp=0xc000687778 pc=0x598a49
net.(*TCPListener).accept(0xc00053f000)
	/root/go/pkg/mod/golang.org/[email protected]/src/net/tcpsock_posix.go:159 +0x1b fp=0xc000687880 sp=0xc000687830 pc=0x5ae3bb
net.(*TCPListener).Accept(0xc00053f000)
	/root/go/pkg/mod/golang.org/[email protected]/src/net/tcpsock.go:380 +0x30 fp=0xc0006878b0 sp=0xc000687880 pc=0x5ad270
net/http.(*onceCloseListener).Accept(0x15f3a20?)
	<autogenerated>:1 +0x24 fp=0xc0006878c8 sp=0xc0006878b0 pc=0x7c4904
net/http.(*Server).Serve(0xc0001fe600, {0x15f1468, 0xc00053f000})
	/root/go/pkg/mod/golang.org/[email protected]/src/net/http/server.go:3424 +0x30c fp=0xc0006879f8 sp=0xc0006878c8 pc=0x79c1cc
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc000034150, 0xf, 0xf})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:984 +0xe09 fp=0xc000687d08 sp=0xc0006879f8 pc=0x98a9a9
github.com/ollama/ollama/runner.Execute({0xc000034130?, 0x0?, 0x0?})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/runner.go:20 +0xc9 fp=0xc000687d30 sp=0xc000687d08 pc=0x98b2a9
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000119000?, {0x1419fa2?, 0x4?, 0x1419fa6?})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/cmd/cmd.go:1529 +0x45 fp=0xc000687d58 sp=0xc000687d30 pc=0x10e7a45
github.com/spf13/cobra.(*Command).execute(0xc0004c4f08, {0xc000119200, 0x10, 0x10})
	/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:940 +0x85c fp=0xc000687e78 sp=0xc000687d58 pc=0x61205c
github.com/spf13/cobra.(*Command).ExecuteC(0xc00017b208)
	/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1068 +0x3a5 fp=0xc000687f30 sp=0xc000687e78 pc=0x6128a5
github.com/spf13/cobra.(*Command).Execute(...)
	/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/root/go/pkg/mod/github.com/spf13/[email protected]/command.go:985
main.main()
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/main.go:12 +0x4d fp=0xc000687f50 sp=0xc000687f30 pc=0x10e84cd
runtime.main()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:283 +0x28b fp=0xc000687fe0 sp=0xc000687f50 pc=0x466f0b
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000687fe8 sp=0xc000687fe0 pc=0x4a2281

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000088fa8 sp=0xc000088f88 pc=0x49ac2e
runtime.goparkunlock(...)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:441
runtime.forcegchelper()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:348 +0xb3 fp=0xc000088fe0 sp=0xc000088fa8 pc=0x467253
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000088fe8 sp=0xc000088fe0 pc=0x4a2281
created by runtime.init.7 in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000089780 sp=0xc000089760 pc=0x49ac2e
runtime.goparkunlock(...)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:441
runtime.bgsweep(0xc000048080)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgcsweep.go:316 +0xdf fp=0xc0000897c8 sp=0xc000089780 pc=0x451a7f
runtime.gcenable.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:204 +0x25 fp=0xc0000897e0 sp=0xc0000897c8 pc=0x445ee5
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000897e8 sp=0xc0000897e0 pc=0x4a2281
created by runtime.gcenable in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x15dea98?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000089f78 sp=0xc000089f58 pc=0x49ac2e
runtime.goparkunlock(...)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x20e1940)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000089fa8 sp=0xc000089f78 pc=0x44f4c9
runtime.bgscavenge(0xc000048080)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000089fc8 sp=0xc000089fa8 pc=0x44fa59
runtime.gcenable.gowrap2()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:205 +0x25 fp=0xc000089fe0 sp=0xc000089fc8 pc=0x445e85
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000089fe8 sp=0xc000089fe0 pc=0x4a2281
created by runtime.gcenable in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000088688?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000088630 sp=0xc000088610 pc=0x49ac2e
runtime.runfinq()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mfinal.go:196 +0x107 fp=0xc0000887e0 sp=0xc000088630 pc=0x444ea7
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000887e8 sp=0xc0000887e0 pc=0x4a2281
created by runtime.createfing in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc0001e68c0 m=nil [chan receive]:
runtime.gopark(0xc00018f9a0?, 0xc00011a048?, 0x60?, 0xa7?, 0x57f788?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc00008a718 sp=0xc00008a6f8 pc=0x49ac2e
runtime.chanrecv(0xc00003c380, 0x0, 0x1)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/chan.go:664 +0x445 fp=0xc00008a790 sp=0xc00008a718 pc=0x4368c5
runtime.chanrecv1(0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/chan.go:506 +0x12 fp=0xc00008a7b8 sp=0xc00008a790 pc=0x436452
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1799 +0x2f fp=0xc00008a7e0 sp=0xc00008a7b8 pc=0x44902f
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00008a7e8 sp=0xc00008a7e0 pc=0x4a2281
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1794 +0x79

goroutine 7 gp=0xc0001e6c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc00008af38 sp=0xc00008af18 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc00008afc8 sp=0xc00008af38 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc00008afe0 sp=0xc00008afc8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00008afe8 sp=0xc00008afe0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000084738 sp=0xc000084718 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc0000847c8 sp=0xc000084738 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc0000847e0 sp=0xc0000847c8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000084f38 sp=0xc000084f18 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc000084fc8 sp=0xc000084f38 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc000084fe0 sp=0xc000084fc8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000085738 sp=0xc000085718 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc0000857c8 sp=0xc000085738 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc0001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000085f38 sp=0xc000085f18 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc000085fc8 sp=0xc000085f38 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 22 gp=0xc000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000086738 sp=0xc000086718 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc0000867c8 sp=0xc000086738 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc0000867e0 sp=0xc0000867c8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 23 gp=0xc000102c40 m=nil [GC worker (idle)]:
runtime.gopark(0xc35bebde049?, 0x3?, 0xde?, 0x91?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 24 gp=0xc000102e00 m=nil [GC worker (idle)]:
runtime.gopark(0xc35beaf5fd8?, 0x1?, 0x68?, 0xd?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 25 gp=0xc000102fc0 m=nil [GC worker (idle)]:
runtime.gopark(0xc35bec321c8?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0001e6e00 m=nil [GC worker (idle)]:
runtime.gopark(0xc35bec88804?, 0x1?, 0xf?, 0x64?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc00008b738 sp=0xc00008b718 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc00008b7c8 sp=0xc00008b738 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc00008b7e0 sp=0xc00008b7c8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00008b7e8 sp=0xc00008b7e0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0001e6fc0 m=nil [GC worker (idle)]:
runtime.gopark(0xc35beb799a4?, 0x1?, 0x43?, 0xc8?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc00008bf38 sp=0xc00008bf18 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc00008bfc8 sp=0xc00008bf38 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc00008bfe0 sp=0xc00008bfc8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00008bfe8 sp=0xc00008bfe0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc0005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0xc35beabb2b1?, 0x3?, 0x8d?, 0x61?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc00050af38 sp=0xc00050af18 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc00050afc8 sp=0xc00050af38 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc00050afe0 sp=0xc00050afc8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00050afe8 sp=0xc00050afe0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0001e7180 m=nil [GC worker (idle)]:
runtime.gopark(0xc35beae64fd?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000506738 sp=0xc000506718 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc0005067c8 sp=0xc000506738 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc0005067e0 sp=0xc0005067c8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0005067e8 sp=0xc0005067e0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc0001e7340 m=nil [GC worker (idle)]:
runtime.gopark(0xc35bebb2ca8?, 0x1?, 0xbb?, 0x85?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000506f38 sp=0xc000506f18 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc000506fc8 sp=0xc000506f38 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc000506fe0 sp=0xc000506fc8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000506fe8 sp=0xc000506fe0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc0001e7500 m=nil [GC worker (idle)]:
runtime.gopark(0xc35be85e253?, 0x0?, 0x0?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc000507738 sp=0xc000507718 pc=0x49ac2e
runtime.gcBgMarkWorker(0xc00003d7a0)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1423 +0xe9 fp=0xc0005077c8 sp=0xc000507738 pc=0x448349
runtime.gcBgMarkStartWorkers.gowrap1()
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x25 fp=0xc0005077e0 sp=0xc0005077c8 pc=0x448225
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0005077e8 sp=0xc0005077e0 pc=0x4a2281
created by runtime.gcBgMarkStartWorkers in goroutine 1
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc000504700 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x0?, 0x0?, 0x60?, 0x0?, 0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:435 +0xce fp=0xc00050d6d0 sp=0xc00050d6b0 pc=0x49ac2e
runtime.goparkunlock(...)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/proc.go:441
runtime.semacquire1(0xc0001339e8, 0x0, 0x1, 0x0, 0x18)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/sema.go:188 +0x21d fp=0xc00050d738 sp=0xc00050d6d0 pc=0x47a41d
sync.runtime_SemacquireWaitGroup(0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/sema.go:110 +0x25 fp=0xc00050d770 sp=0xc00050d738 pc=0x49c625
sync.(*WaitGroup).Wait(0x0?)
	/root/go/pkg/mod/golang.org/[email protected]/src/sync/waitgroup.go:118 +0x48 fp=0xc00050d798 sp=0xc00050d770 pc=0x4adbc8
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc0001339e0, {0x15f3a90, 0xc000530a50})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:355 +0x25 fp=0xc00050d7b8 sp=0xc00050d798 pc=0x9854c5
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap2()
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:960 +0x28 fp=0xc00050d7e0 sp=0xc00050d7b8 pc=0x98ac48
runtime.goexit({})
	/root/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00050d7e8 sp=0xc00050d7e0 pc=0x4a2281
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:960 +0xa74

rax    0x7f64b0004340
rbx    0x0
rcx    0x7f64caff8570
rdx    0x4
rdi    0x0
rsi    0x0
rbp    0x0
rsp    0x7f64caff8450
r8     0x0
r9     0x7f64b0000030
r10    0x0
r11    0xf0ca7badcf26f777
r12    0x0
r13    0x7f64b000f1c0
r14    0x7f64caff8570
r15    0x4
rip    0x7f651452276f
rflags 0x10202
cs     0x33
fs     0x0
gs     0x0
time=2025-07-02T17:02:25.561+03:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/07/02 - 17:02:25 | 500 |  511.694099ms |       127.0.0.1 | POST     "/api/generate"

and I have the same error for the gemma3:4b-it-fp16

Intel B580

Jul 02 '25 13:07 FilipLaurentiu

Same on Ubuntu 22.04.05 (A770) 6.5.0-35 kernel

arc@xpu:~/Downloads/ollama-ipex-llm-2.3.0b20250630-ubuntu$ ./start-ollama.sh
time=2025-07-02T17:24:44.742+03:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/arc/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]"
time=2025-07-02T17:24:44.742+03:00 level=INFO source=images.go:476 msg="total blobs: 10"
time=2025-07-02T17:24:44.743+03:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-07-02T17:24:44.743+03:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)"
time=2025-07-02T17:24:44.743+03:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-07-02T17:24:44.743+03:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
time=2025-07-02T17:24:44.756+03:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="125.8 GiB" available="120.9 GiB"
[GIN] 2025/07/02 - 17:24:59 | 200 |     102.481µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/07/02 - 17:24:59 | 200 |  208.347134ms |       127.0.0.1 | POST     "/api/show"
time=2025-07-02T17:24:59.881+03:00 level=INFO source=server.go:135 msg="system memory" total="125.8 GiB" free="120.8 GiB" free_swap="0 B"
time=2025-07-02T17:24:59.882+03:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=36 layers.offload=0 layers.split="" memory.available="[120.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.0 GiB" memory.required.partial="0 B" memory.required.kv="280.0 MiB" memory.required.allocations="[5.0 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.2 GiB" memory.weights.nonrepeating="420.4 MiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB"
time=2025-07-02T17:25:00.065+03:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/home/arc/Downloads/ollama-ipex-llm-2.3.0b20250630-ubuntu/ollama-bin runner --ollama-engine --model /home/arc/.ollama/models/blobs/sha256-38e8dcc30df4eb0e29eaf5c74ba6ce3f2cd66badad50768fc14362acfb8b8cb6 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 36 --no-mmap --parallel 2 --port 33041"
time=2025-07-02T17:25:00.066+03:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-07-02T17:25:00.066+03:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-07-02T17:25:00.066+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
time=2025-07-02T17:25:00.135+03:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-07-02T17:25:00.136+03:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:33041"
time=2025-07-02T17:25:00.235+03:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=Q4_K_M name="" description="" num_tensors=847 num_key_values=40
load_backend: loaded SYCL backend from /home/arc/Downloads/ollama-ipex-llm-2.3.0b20250630-ubuntu/libggml-sycl.so
load_backend: loaded CPU backend from /home/arc/Downloads/ollama-ipex-llm-2.3.0b20250630-ubuntu/libggml-cpu-haswell.so
time=2025-07-02T17:25:00.291+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
Running with Environment Variables:
  GGML_SYCL_DEBUG: 0
  GGML_SYCL_DISABLE_OPT: 1
  GGML_SYCL_DISABLE_GRAPH: 1
  GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
  GGML_SYCL_FORCE_MMQ: no
  GGML_SYCL_F16: no
Found 4 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16225M|            1.3.27642|
| 1| [level_zero:gpu:1]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16225M|            1.3.27642|
| 2| [level_zero:gpu:2]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16225M|            1.3.27642|
| 3| [level_zero:gpu:3]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16225M|            1.3.27642|
SYCL Optimization Feature:
|ID|        Device Type|Reorder|
|--|-------------------|-------|
| 0| [level_zero:gpu:0]|      Y|
| 1| [level_zero:gpu:1]|      Y|
| 2| [level_zero:gpu:2]|      Y|
| 3| [level_zero:gpu:3]|      Y|
time=2025-07-02T17:25:00.318+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 4697620480 Bytes of memory on device
alloc_tensor_range: failed to allocate SYCL0 buffer of size 4697620480
panic: insufficient memory - required allocations: {InputWeights:440832000A CPU:{Name:CPU UUID: Weights:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U} GPUs:[{Name:SYCL0 UUID: Weights:[72269184F 72269184F 72269184F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 63348096F 63348096F 72269184F 72269184F 72269184F 72269184F 72269184F 5162939392F] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U}]}

goroutine 87 [running]:
github.com/ollama/ollama/ml/backend/ggml.New({0x7ffd807600ce, 0x66}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:380 +0x30b6
github.com/ollama/ollama/ml.NewBackend({0x7ffd807600ce, 0x66}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend.go:209 +0xb1
github.com/ollama/ollama/model.New({0x7ffd807600ce?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0})
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/model/model.go:102 +0x8f
github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc000034360, {0x7ffd807600ce?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...)
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:841 +0x8d
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000034360, {0x15f3a90, 0xc00040d950}, {0x7ffd807600ce?, 0x0?}, {0x24, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...)
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	/home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11
time=2025-07-02T17:25:01.070+03:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/07/02 - 17:25:01 | 500 |   1.61100836s |       127.0.0.1 | POST     "/api/generate"

Jul 02 '25 14:07 savvadesogle

@rnwang04

[GIN] 2025/07/02 - 22:18:00 | 500 | 1.254786566s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/07/02 - 22:18:38 | 200 | 21.324µs | 127.0.0.1 | HEAD "/" [GIN] 2025/07/02 - 22:18:39 | 200 | 655.776804ms | 127.0.0.1 | POST "/api/show" time=2025-07-02T22:18:39.599+08:00 level=INFO source=server.go:135 msg="system memory" total="15.6 GiB" free="13.1 GiB" free_swap="16.0 GiB" time=2025-07-02T22:18:39.599+08:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=31 layers.offload=0 layers.split="" memory.available="[13.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="0 B" memory.required.kv="240.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.4 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="420.4 MiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB" time=2025-07-02T22:18:39.712+08:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/home/liuyz/packages/ollama-ipex-llm-2.3.0b20250630-ubuntu/ollama-bin runner --ollama-engine --model /mnt/d/Ollama_models/blobs/sha256-3839a254cf2d00b208c6e2524c129e4438f9d106bba4c3fbc12b631f519d1de1 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 10 --no-mmap --parallel 2 --port 42955" time=2025-07-02T22:18:39.712+08:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 time=2025-07-02T22:18:39.712+08:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding" time=2025-07-02T22:18:39.713+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding" time=2025-07-02T22:18:39.742+08:00 level=INFO source=runner.go:925 msg="starting ollama engine" time=2025-07-02T22:18:39.744+08:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:42955" time=2025-07-02T22:18:39.965+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model" time=2025-07-02T22:18:40.060+08:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=Q4_K_M name="" description="" num_tensors=727 num_key_values=40 load_backend: loaded SYCL backend from /home/liuyz/packages/ollama-ipex-llm-2.3.0b20250630-ubuntu/libggml-sycl.so load_backend: loaded CPU backend from /home/liuyz/packages/ollama-ipex-llm-2.3.0b20250630-ubuntu/libggml-cpu-alderlake.so time=2025-07-02T22:18:40.108+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Running with Environment Variables: GGML_SYCL_DEBUG: 0 GGML_SYCL_DISABLE_OPT: 1 GGML_SYCL_DISABLE_GRAPH: 1 GGML_SYCL_PRIORITIZE_DMMV: 0 Build with Macros: GGML_SYCL_FORCE_MMQ: no GGML_SYCL_F16: no Found 1 SYCL devices: | | | | |Max | |Max |Global | | | | | | |compute|Max work|sub |mem | |

ID	Device Type	Name	Version	units	group	group	size	Driver version
0	[level_zero:gpu:0]	Intel Graphics [0xa7a0]	12.3	96	512	32	15508M	1.6.32224+14
SYCL Optimization Feature:
ID	Device Type	Reorder
--	-------------------	-------
0	[level_zero:gpu:0]	N
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 4026531840 Bytes of memory on device
alloc_tensor_range: failed to allocate SYCL0 buffer of size 4026531840
panic: insufficient memory - required allocations: {InputWeights:440832000A CPU:{Name:CPU UUID: Weights:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U} GPUs:[{Name:SYCL0 UUID: Weights:[39632256F 39632256F 39632256F 35036544F 35036544F 39632256F 35036544F 35036544F 39632256F 35036544F 35036544F 39632256F 35036544F 35036544F 39632256F 35036544F 35036544F 39632256F 35036544F 35036544F 39632256F 35036544F 35036544F 39632256F 35036544F 35036544F 39632256F 39632256F 39632256F 39632256F 4490376192F] Cache:[0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U 0U] Graph:0U}]}

goroutine 11 [running]: github.com/ollama/ollama/ml/backend/ggml.New({0x7fff98af3f86, 0x62}, {0xa, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}) /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend/ggml/ggml.go:380 +0x30b6 github.com/ollama/ollama/ml.NewBackend({0x7fff98af3f86, 0x62}, {0xa, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}) /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/ml/backend.go:209 +0xb1 github.com/ollama/ollama/model.New({0x7fff98af3f86?, 0x0?}, {0xa, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}) /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/model/model.go:102 +0x8f github.com/ollama/ollama/runner/ollamarunner.(*Server).initModel(0xc000160120, {0x7fff98af3f86?, 0x0?}, {0xa, 0x0, 0x3e7, {0x0, 0x0, 0x0}, 0x0}, ...) /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:841 +0x8d github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0xc000160120, {0x15f3a90, 0xc000174550}, {0x7fff98af3f86?, 0x0?}, {0xa, 0x0, 0x3e7, {0x0, 0x0, ...}, ...}, ...) /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:878 +0xb8 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 /home/runner/_work/llm.cpp/llm.cpp/ollama-internal/runner/ollamarunner/runner.go:959 +0xa11 time=2025-07-02T22:18:40.466+08:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"

Jul 02 '25 14:07 yizhangliu

@savvadesogle @yizhangliu It seems you are facing the same error. Have you ever run other ollama models successfully or run other version ollama portable zip successfully on your machine ? At the same time, have you ever enabled the Resizable BAR in your system's BIOS ? For detailed steps, please refer to the official guidance here. @savvadesogle Besides, currently for ollama runner, multi-card is not supported as there remain some bugs, we are trying to fix this. So you want to run gemma3n, please limit to one card first:

# linux
export ONEAPI_DEVICE_SELECTOR=level_zero:0
./ollama serve

# windows
set ONEAPI_DEVICE_SELECTOR=level_zero:0
ollama serve

Jul 03 '25 01:07 rnwang04

Hi @FilipLaurentiu , we never meet such error before. What is your CPU & GPU ? And have you ever run other ollama models successfully or run other version ollama portable zip successfully on your machine ?

Jul 03 '25 01:07 rnwang04

windows

set ONEAPI_DEVICE_SELECTOR=level_zero:0 ollama serve

Hello

Windows ollama-ipex-llm-2.3.0b20250612-win working (llama3.1 8b, q4_k_m)

Windows ollama-ipex-llm-2.3.0b20250630-win working (llama3.1 8b, q4_k_m) ollama 0.9.3

Besides, currently for ollama runner, multi-card is not supported as there remain some bugs

!!! 2 GPUs (set ONEAPI_DEVICE_SELECTOR=level_zero:2,3)

Windows ollama-ipex-llm-2.3.0b20250630-win working (llama3.1 8b, q4_k_m) ollama 0.9.3

Windows ollama-ipex-llm-2.3.0b20250630-win NOT WORKING (gemma3n:e4b) ollama 0.9.3

Windows device manager

First time i used 1gpu (https://github.com/intel/ipex-llm/issues/13245#issuecomment-3026985373) - and it doesn't work.

For windows (rebar enabled,16gb size, x16 gen 3 pcie), xeon 2699v3 x2, hyper threading enabled. 4x A770

SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0
SYCL_CACHE_PERSISTENT=1
ZES_ENABLE_SYSMAN=1

And yes, portable and ipex-llm[cpp] was working before. But with other version, and llama3.1 8b it works.

env-check

root@xpu:/home/arc/llm/models# source venv/bin/activate
(venv) root@xpu:/home/arc/llm/models# bash env-check.sh 
-----------------------------------------------------------------
PYTHON_VERSION=3.10.12
-----------------------------------------------------------------
Transformers is not installed. 
-----------------------------------------------------------------
PyTorch is not installed. 
-----------------------------------------------------------------
ipex-llm WARNING: Package(s) not found: ipex-llm
-----------------------------------------------------------------
IPEX is not installed. 
-----------------------------------------------------------------
CPU Information: 
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        46 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               72
On-line CPU(s) list:                  0-71
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
CPU family:                           6
Model:                                63
Thread(s) per core:                   2
Core(s) per socket:                   18
Socket(s):                            2
Stepping:                             2
CPU max MHz:                          3600.0000
CPU min MHz:                          1200.0000
BogoMIPS:                             4589.90
-----------------------------------------------------------------
Total CPU Memory: 125.753 GB
Memory Type: DDR4 
-----------------------------------------------------------------
Operating System: 
Ubuntu 22.04.5 LTS \n \l

-----------------------------------------------------------------
Linux xpu 6.8.0-60-generic #63~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 22 19:00:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
CLI:
    Version: 1.2.41.20250414
    Build ID: 002a9706

Service:
    Version: 1.2.41.20250414
    Build ID: 002a9706
    Level Zero Version: 1.14.0
-----------------------------------------------------------------
env-check.sh: line 154: clinfo: command not found
-----------------------------------------------------------------
Driver related package version:
ii  intel-fw-gpu                               2025.13.2-398~22.04                     all          Firmware package for Intel integrated and discrete GPUs
ii  intel-i915-dkms                            1.23.10.92.231129.101+i141-1            all          Out of tree i915 driver.
ii  intel-level-zero-gpu                       1.3.27642.69-803.145~22.04              amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
-----------------------------------------------------------------
env-check.sh: line 167: sycl-ls: command not found
igpu not detected
-----------------------------------------------------------------
xpu-smi is properly installed. 
-----------------------------------------------------------------
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Arc(TM) A770 Graphics                                          |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-0005-0000-000856a08086                                       |
|           | PCI BDF Address: 0000:05:00.0                                                        |
|           | DRM Device: /dev/dri/card1                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 1         | Device Name: Intel(R) Arc(TM) A770 Graphics                                          |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-0009-0000-000856a08086                                       |
|           | PCI BDF Address: 0000:09:00.0                                                        |
|           | DRM Device: /dev/dri/card2                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 2         | Device Name: Intel(R) Arc(TM) A770 Graphics                                          |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-0085-0000-000856a08086                                       |
|           | PCI BDF Address: 0000:85:00.0                                                        |
|           | DRM Device: /dev/dri/card3                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 3         | Device Name: Intel(R) Arc(TM) A770 Graphics                                          |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-0089-0000-000856a08086                                       |
|           | PCI BDF Address: 0000:89:00.0                                                        |
|           | DRM Device: /dev/dri/card4                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
GPU0 Memory size=16G
GPU1 Memory size=16G
GPU2 Memory size=16G
GPU3 Memory size=16G
-----------------------------------------------------------------
05:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
	Subsystem: ASRock Incorporation Device 6012
	Flags: bus master, fast devsel, latency 0, IRQ 75, NUMA node 0, IOMMU group 71
	Memory at 90000000 (64-bit, non-prefetchable) [size=16M]
	Memory at 38000000000 (64-bit, prefetchable) [size=16G]
	Expansion ROM at <ignored> [disabled]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
--
09:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
	Subsystem: ASRock Incorporation Device 6012
	Flags: bus master, fast devsel, latency 0, IRQ 78, NUMA node 0, IOMMU group 76
	Memory at c4000000 (64-bit, non-prefetchable) [size=16M]
	Memory at 3b800000000 (64-bit, prefetchable) [size=16G]
	Expansion ROM at c5000000 [disabled] [size=2M]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
--
85:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
	Subsystem: ASRock Incorporation Device 6012
	Flags: bus master, fast devsel, latency 0, IRQ 81, NUMA node 2, IOMMU group 8
	Memory at fa000000 (64-bit, non-prefetchable) [size=16M]
	Memory at 3f800000000 (64-bit, prefetchable) [size=16G]
	Expansion ROM at fb000000 [disabled] [size=2M]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
--
89:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
	Subsystem: ASRock Incorporation Device 6012
	Flags: bus master, fast devsel, latency 0, IRQ 84, NUMA node 2, IOMMU group 13
	Memory at f8000000 (64-bit, non-prefetchable) [size=16M]
	Memory at 3f000000000 (64-bit, prefetchable) [size=16G]
	Expansion ROM at f9000000 [disabled] [size=2M]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
-----------------------------------------------------------------

Jul 03 '25 05:07 savvadesogle

Hi @FilipLaurentiu , we never meet such error before. What is your CPU & GPU ? And have you ever run other ollama models successfully or run other version ollama portable zip successfully on your machine ?

Xeon w3-2435 and Intel B580 OC version. I usually run the docker container with the intelanalytics/ipex-llm-inference-cpp-xpu:latest image. I run different models gemma3:4b-it-fp16, deepseek-r1:14b, but I encouter problems from time to time. I am able to run ollama (not the intel version) on my CPU and run gemma3:4b-it-fp16 without issues. It's the first time I try the compiled nightly version, if you want, I can test specific versions and models on my hardware and send you the logs, I can't read them...

Jul 03 '25 05:07 FilipLaurentiu

Hi @FilipLaurentiu , we never meet such error before. What is your CPU & GPU ? And have you ever run other ollama models successfully or run other version ollama portable zip successfully on your machine ?

Xeon w3-2435 and Intel B580 OC version. I usually run the docker container with the intelanalytics/ipex-llm-inference-cpp-xpu:latest image. I run different models gemma3:4b-it-fp16, deepseek-r1:14b, but I encouter problems from time to time. I am able to run ollama (not the intel version) on my CPU and run gemma3:4b-it-fp16 without issues. It's the first time I try the compiled nightly version, if you want, I can test specific versions and models on my hardware and send you the logs, I can't read them...

I just wanted to test the stable version but this seems to have other problems. This might be a separate issue that need to be opened

➜  ollama-ipex-llm-2.2.0-ubuntu ./start-ollama.sh 
SIGSEGV: segmentation violation
PC=0x7f47c8aede6f m=0 sigcode=1 addr=0x8
signal arrived during cgo execution

goroutine 1 gp=0xc0000061c0 m=0 mp=0x55e86cda54c0 [syscall]:
runtime.cgocall(0x55e86c05ad30, 0xc00050d960)
	runtime/cgocall.go:167 +0x4b fp=0xc00050d938 sp=0xc00050d900 pc=0x55e86b4b80cb
ollama/llama/llamafile._Cfunc_llama_print_system_info()
	_cgo_gotypes.go:837 +0x4c fp=0xc00050d960 sp=0xc00050d938 pc=0x55e86b87b86c
ollama/llama/llamafile.PrintSystemInfo()
	ollama/llama/llamafile/llama.go:70 +0x79 fp=0xc00050d9a8 sp=0xc00050d960 pc=0x55e86b87c8f9
ollama/cmd.NewCLI()
	ollama/cmd/cmd.go:1427 +0xc08 fp=0xc00050df30 sp=0xc00050d9a8 pc=0x55e86c0532e8
main.main()
	ollama/main.go:12 +0x13 fp=0xc00050df50 sp=0xc00050df30 pc=0x55e86c05a173
runtime.main()
	runtime/proc.go:272 +0x29d fp=0xc00050dfe0 sp=0xc00050df50 pc=0x55e86b489add
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00050dfe8 sp=0xc00050dfe0 pc=0x55e86b4c6ba1

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00008afa8 sp=0xc00008af88 pc=0x55e86b4be7ce
runtime.goparkunlock(...)
	runtime/proc.go:430
runtime.forcegchelper()
	runtime/proc.go:337 +0xb8 fp=0xc00008afe0 sp=0xc00008afa8 pc=0x55e86b489e18
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008afe8 sp=0xc00008afe0 pc=0x55e86b4c6ba1
created by runtime.init.7 in goroutine 1
	runtime/proc.go:325 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00008b780 sp=0xc00008b760 pc=0x55e86b4be7ce
runtime.goparkunlock(...)
	runtime/proc.go:430
runtime.bgsweep(0xc00004a080)
	runtime/mgcsweep.go:317 +0xdf fp=0xc00008b7c8 sp=0xc00008b780 pc=0x55e86b4744bf
runtime.gcenable.gowrap1()
	runtime/mgc.go:204 +0x25 fp=0xc00008b7e0 sp=0xc00008b7c8 pc=0x55e86b468b05
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008b7e8 sp=0xc00008b7e0 pc=0x55e86b4c6ba1
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x55e86c2056a8?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00008bf78 sp=0xc00008bf58 pc=0x55e86b4be7ce
runtime.goparkunlock(...)
	runtime/proc.go:430
runtime.(*scavengerState).park(0x55e86cda2dc0)
	runtime/mgcscavenge.go:425 +0x49 fp=0xc00008bfa8 sp=0xc00008bf78 pc=0x55e86b471e89
runtime.bgscavenge(0xc00004a080)
	runtime/mgcscavenge.go:658 +0x59 fp=0xc00008bfc8 sp=0xc00008bfa8 pc=0x55e86b472419
runtime.gcenable.gowrap2()
	runtime/mgc.go:205 +0x25 fp=0xc00008bfe0 sp=0xc00008bfc8 pc=0x55e86b468aa5
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008bfe8 sp=0xc00008bfe0 pc=0x55e86b4c6ba1
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc00008a648?, 0x55e86b45f005?, 0xb0?, 0x1?, 0xc0000061c0?)
	runtime/proc.go:424 +0xce fp=0xc00008a620 sp=0xc00008a600 pc=0x55e86b4be7ce
runtime.runfinq()
	runtime/mfinal.go:193 +0x107 fp=0xc00008a7e0 sp=0xc00008a620 pc=0x55e86b467b87
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008a7e8 sp=0xc00008a7e0 pc=0x55e86b4c6ba1
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:163 +0x3d

goroutine 6 gp=0xc000200e00 m=nil [chan receive]:
runtime.gopark(0xc00008c760?, 0x55e86b59a205?, 0x40?, 0x28?, 0x55e86c61c000?)
	runtime/proc.go:424 +0xce fp=0xc00008c718 sp=0xc00008c6f8 pc=0x55e86b4be7ce
runtime.chanrecv(0xc000040310, 0x0, 0x1)
	runtime/chan.go:639 +0x41c fp=0xc00008c790 sp=0xc00008c718 pc=0x55e86b4581fc
runtime.chanrecv1(0x0?, 0x0?)
	runtime/chan.go:489 +0x12 fp=0xc00008c7b8 sp=0xc00008c790 pc=0x55e86b457db2
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
	runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
	runtime/mgc.go:1784 +0x2f fp=0xc00008c7e0 sp=0xc00008c7b8 pc=0x55e86b46bb6f
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x55e86b4c6ba1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
	runtime/mgc.go:1779 +0x96

goroutine 7 gp=0xc0002016c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00008cf38 sp=0xc00008cf18 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00008cfc8 sp=0xc00008cf38 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00008cfe0 sp=0xc00008cfc8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008cfe8 sp=0xc00008cfe0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 8 gp=0xc000201880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00008d738 sp=0xc00008d718 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00008d7c8 sp=0xc00008d738 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00008d7e0 sp=0xc00008d7c8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008d7e8 sp=0xc00008d7e0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000086738 sp=0xc000086718 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc0000867c8 sp=0xc000086738 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc0000867e0 sp=0xc0000867c8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 9 gp=0xc000201a40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00008df38 sp=0xc00008df18 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00008dfc8 sp=0xc00008df38 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00008dfe0 sp=0xc00008dfc8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 34 gp=0xc000104380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00011c738 sp=0xc00011c718 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00011c7c8 sp=0xc00011c738 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00011c7e0 sp=0xc00011c7c8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011c7e8 sp=0xc00011c7e0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 10 gp=0xc000201c00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000118738 sp=0xc000118718 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc0001187c8 sp=0xc000118738 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc0001187e0 sp=0xc0001187c8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0001187e8 sp=0xc0001187e0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 11 gp=0xc000201dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000118f38 sp=0xc000118f18 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc000118fc8 sp=0xc000118f38 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc000118fe0 sp=0xc000118fc8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000118fe8 sp=0xc000118fe0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 12 gp=0xc0004a4000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000119738 sp=0xc000119718 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc0001197c8 sp=0xc000119738 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc0001197e0 sp=0xc0001197c8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc0001197e8 sp=0xc0001197e0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 13 gp=0xc0004a41c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc000119f38 sp=0xc000119f18 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc000119fc8 sp=0xc000119f38 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc000119fe0 sp=0xc000119fc8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 14 gp=0xc0004a4380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 35 gp=0xc000104540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00011cf38 sp=0xc00011cf18 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00011cfc8 sp=0xc00011cf38 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00011cfe0 sp=0xc00011cfc8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011cfe8 sp=0xc00011cfe0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 15 gp=0xc0004a4540 m=nil [GC worker (idle)]:
runtime.gopark(0xc939efea09?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 16 gp=0xc0004a4700 m=nil [GC worker (idle)]:
runtime.gopark(0xc939ee4801?, 0x1?, 0x1c?, 0x3f?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 36 gp=0xc000104700 m=nil [GC worker (idle)]:
runtime.gopark(0xc939ee4dd5?, 0x1?, 0x40?, 0x5d?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00011d738 sp=0xc00011d718 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00011d7c8 sp=0xc00011d738 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00011d7e0 sp=0xc00011d7c8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011d7e8 sp=0xc00011d7e0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

goroutine 37 gp=0xc0001048c0 m=nil [GC worker (idle)]:
runtime.gopark(0x55e86cdcc940?, 0x1?, 0xe8?, 0xd1?, 0x0?)
	runtime/proc.go:424 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x55e86b4be7ce
runtime.gcBgMarkWorker(0xc000041730)
	runtime/mgc.go:1412 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x55e86b46ae69
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1328 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x55e86b46ad45
runtime.goexit({})
	runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x55e86b4c6ba1
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1328 +0x105

rax    0x55e87819fbc0
rbx    0x0
rcx    0x7ffd5cfd92d0
rdx    0x4
rdi    0x0
rsi    0x0
rbp    0x0
rsp    0x7ffd5cfd91a0
r8     0x0
r9     0x7f47df9f6ac0
r10    0x1
r11    0x1a09408f589b8a82
r12    0x0
r13    0x55e877aa6ea0
r14    0x7ffd5cfd92d0
r15    0x4
rip    0x7f47c8aede6f
rflags 0x10206
cs     0x33
fs     0x0
gs     0x0

Jul 03 '25 05:07 FilipLaurentiu

@FilipLaurentiu they are working on resolving it.

Jul 07 '25 05:07 Ellie-Williams-007

@savvadesogle @yizhangliu we have made a fix for gemma3n, you could try if this new release (https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250710-ubuntu.tgz / https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250708-win.zip) works.

Jul 10 '25 06:07 rnwang04

https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250708-win.zip)

it's working! Thank you! windows 11, A770, 2699v3 gemma3n:e4b 15cb39fd9394 7.5 GB

Jul 10 '25 10:07 savvadesogle

https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250710-ubuntu.tgz

Unfortunately, I have a problem on Ubuntu 24.04 (6.8.0-63-generic):

Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:4514

!!! little update (export SYCL_CACHE_PERSISTENT=1)

it works without this parameter.

arc@xpu:~/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu$ ./ollama run gemma3n:e4b --verbose
>>> hi
Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details
arc@xpu:~/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu$ ./ollama run llama3.1:8b --verbose
>>> how to make a tea?
Error: model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details

server log:

start-ollama.sh

#!/bin/bash
export OLLAMA_NUM_GPU=999
export no_proxy=localhost,127.0.0.1
export ZES_ENABLE_SYSMAN=1
export SYCL_CACHE_PERSISTENT=1
export OLLAMA_KEEP_ALIVE=10m
# [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

# export OLLAMA_HOST='127.0.0.1:11434'

# [optional] if you want to run on single GPU, use below command to limit GPU may improve performance
# export ONEAPI_DEVICE_SELECTOR=level_zero:0
# If you have more than one dGPUs, according to your configuration you can use configuration like below, it will use the first and second card.
export ONEAPI_DEVICE_SELECTOR="level_zero:0"

./ollama serve

OLAMA SERVE ollama-ipex-llm-2.3.0b20250710-ubuntu

arc@xpu:~/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu$ bash start-ollama.sh time=2025-07-11T11:28:16.548+03:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/arc/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]" time=2025-07-11T11:28:16.548+03:00 level=INFO source=images.go:476 msg="total blobs: 9" time=2025-07-11T11:28:16.549+03:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

using env: export GIN_MODE=release
using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) time=2025-07-11T11:28:16.549+03:00 level=INFO source=routes.go:1288 msg="Listening on 127.0.0.1:11434 (version 0.9.3)" time=2025-07-11T11:28:16.549+03:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-07-11T11:28:16.549+03:00 level=INFO source=gpu.go:218 msg="using Intel GPU" time=2025-07-11T11:28:16.562+03:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="125.8 GiB" available="121.6 GiB" [GIN] 2025/07/11 - 11:28:19 | 200 | 110.094µs | 127.0.0.1 | HEAD "/" [GIN] 2025/07/11 - 11:28:20 | 200 | 282.148071ms | 127.0.0.1 | POST "/api/show" time=2025-07-11T11:28:20.507+03:00 level=INFO source=server.go:135 msg="system memory" total="125.8 GiB" free="121.6 GiB" free_swap="8.0 GiB" time=2025-07-11T11:28:20.509+03:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=36 layers.offload=0 layers.split="" memory.available="[121.6 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.0 GiB" memory.required.partial="0 B" memory.required.kv="280.0 MiB" memory.required.allocations="[5.0 GiB]" memory.weights.total="2.6 GiB" memory.weights.repeating="2.2 GiB" memory.weights.nonrepeating="420.4 MiB" memory.graph.full="2.0 GiB" memory.graph.partial="3.7 GiB" time=2025-07-11T11:28:20.690+03:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/home/arc/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu/ollama-bin runner --ollama-engine --model /home/arc/.ollama/models/blobs/sha256-38e8dcc30df4eb0e29eaf5c74ba6ce3f2cd66badad50768fc14362acfb8b8cb6 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 36 --no-mmap --parallel 2 --port 37851" time=2025-07-11T11:28:20.691+03:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 time=2025-07-11T11:28:20.691+03:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding" time=2025-07-11T11:28:20.692+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding" time=2025-07-11T11:28:20.758+03:00 level=INFO source=runner.go:925 msg="starting ollama engine" time=2025-07-11T11:28:20.758+03:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:37851" time=2025-07-11T11:28:20.914+03:00 level=INFO source=ggml.go:96 msg="" architecture=gemma3n file_type=Q4_K_M name="" description="" num_tensors=847 num_key_values=40 time=2025-07-11T11:28:20.943+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model" load_backend: loaded SYCL backend from /home/arc/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu/libggml-sycl.so load_backend: loaded CPU backend from /home/arc/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu/libggml-cpu-haswell.so time=2025-07-11T11:28:20.986+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) Running with Environment Variables: GGML_SYCL_DEBUG: 0 GGML_SYCL_DISABLE_OPT: 1 GGML_SYCL_DISABLE_GRAPH: 1 GGML_SYCL_PRIORITIZE_DMMV: 0 Build with Macros: GGML_SYCL_FORCE_MMQ: no GGML_SYCL_F16: no Found 1 SYCL devices: | | | | |Max | |Max |Global | | | | | | |compute|Max work|sub |mem | |

ID	Device Type	Name	Version	units	group	group	size	Driver version
0	[level_zero:gpu:0]	Intel Arc A770 Graphics	12.55	512	1024	32	16225M	1.6.33944+12
SYCL Optimization Feature:
ID	Device Type	Reorder
--	-------------------	-------
0	[level_zero:gpu:0]	Y
time=2025-07-11T11:28:21.183+03:00 level=INFO source=ggml.go:390 msg="model weights" buffer=SYCL0 size="2.6 GiB"
time=2025-07-11T11:28:21.183+03:00 level=INFO source=ggml.go:390 msg="model weights" buffer=CPU size="4.8 GiB"
time=2025-07-11T11:28:21.183+03:00 level=INFO source=ggml.go:419 msg="Sycl device count" count=1
time=2025-07-11T11:28:21.212+03:00 level=INFO source=ggml.go:712 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 size="271.7 MiB"
time=2025-07-11T11:28:21.212+03:00 level=INFO source=ggml.go:712 msg="compute graph" backend=SYCL0 buffer_type=SYCL0 size="0 B"
time=2025-07-11T11:28:21.212+03:00 level=INFO source=ggml.go:712 msg="compute graph" backend=CPU buffer_type=CPU size="1.5 GiB"
time=2025-07-11T11:28:28.978+03:00 level=INFO source=server.go:657 msg="llama runner started in 8.29 seconds"
[GIN] 2025/07/11 - 11:28:28	200	8.932052856s	127.0.0.1	POST "/api/generate"
The program was built for 1 devices
Build program log for 'Intel(R) Arc(TM) A770 Graphics':
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:4514
time=2025-07-11T11:28:32.735+03:00 level=ERROR source=server.go:827 msg="post predict" error="Post "http://127.0.0.1:37851/completion": EOF"
[GIN] 2025/07/11 - 11:28:32	200	1.53028641s	127.0.0.1	POST "/api/chat"
[GIN] 2025/07/11 - 11:38:12	200	64.94µs	127.0.0.1	HEAD "/"
[GIN] 2025/07/11 - 11:38:12	200	170.278155ms	127.0.0.1	POST "/api/show"
time=2025-07-11T11:38:12.825+03:00 level=INFO source=server.go:135 msg="system memory" total="125.8 GiB" free="120.7 GiB" free_swap="8.0 GiB"
time=2025-07-11T11:38:12.825+03:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=33 layers.offload=0 layers.split="" memory.available="[120.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.2 GiB" memory.required.partial="0 B" memory.required.kv="512.0 MiB" memory.required.allocations="[5.2 GiB]" memory.weights.total="4.3 GiB" memory.weights.repeating="3.9 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="677.5 MiB"
llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /home/arc/.ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
llama_model_loader: - kv 5: general.size_label str = 8B
llama_model_loader: - kv 6: general.license str = llama3.1
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: llama.block_count u32 = 32
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 4096
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: general.file_type u32 = 15
llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 28: general.quantization_version u32 = 2
llama_model_loader: - type f32: 66 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 4.58 GiB (4.89 BPW)
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch = llama
print_info: vocab_only = 1
print_info: model type = ?B
print_info: model params = 8.03 B
print_info: general.name = Meta Llama 3.1 8B Instruct
print_info: vocab type = BPE
print_info: n_vocab = 128256
print_info: n_merges = 280147
print_info: BOS token = 128000 '<	begin_of_text	>'
print_info: EOS token = 128009 '<	eot_id	>'
print_info: EOT token = 128009 '<	eot_id	>'
print_info: EOM token = 128008 '<	eom_id	>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 128008 '<	eom_id	>'
print_info: EOG token = 128009 '<	eot_id	>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-07-11T11:38:13.361+03:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/home/arc/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu/ollama-bin runner --model /home/arc/.ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 36 --no-mmap --parallel 2 --port 45907"
time=2025-07-11T11:38:13.362+03:00 level=INFO source=sched.go:483 msg="loaded runners" count=2
time=2025-07-11T11:38:13.362+03:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-07-11T11:38:13.363+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
using override patterns: []
time=2025-07-11T11:38:13.427+03:00 level=INFO source=runner.go:851 msg="starting go runner"
load_backend: loaded SYCL backend from /home/arc/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu/libggml-sycl.so
load_backend: loaded CPU backend from /home/arc/Downloads/ollama-ipex-llm-2.3.0b20250710-ubuntu/libggml-cpu-haswell.so
time=2025-07-11T11:38:13.476+03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.0.OPENMP=1 CPU.0.AARCH64_REPACK=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-07-11T11:38:13.476+03:00 level=INFO source=runner.go:911 msg="Server listening on 127.0.0.1:45907"
time=2025-07-11T11:38:13.615+03:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server loading model"
llama_model_load_from_file_impl: using device SYCL0 (Intel(R) Arc(TM) A770 Graphics) - 15473 MiB free
llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /home/arc/.ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
llama_model_loader: - kv 5: general.size_label str = 8B
llama_model_loader: - kv 6: general.license str = llama3.1
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: llama.block_count u32 = 32
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 4096
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: general.file_type u32 = 15
llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 28: general.quantization_version u32 = 2
llama_model_loader: - type f32: 66 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 4.58 GiB (4.89 BPW)
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 4096
print_info: n_layer = 32
print_info: n_head = 32
print_info: n_head_kv = 8
print_info: n_rot = 128
print_info: n_swa = 0
print_info: n_swa_pattern = 1
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 4
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 14336
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 8B
print_info: model params = 8.03 B
print_info: general.name = Meta Llama 3.1 8B Instruct
print_info: vocab type = BPE
print_info: n_vocab = 128256
print_info: n_merges = 280147
print_info: BOS token = 128000 '<	begin_of_text	>'
print_info: EOS token = 128009 '<	eot_id	>'
print_info: EOT token = 128009 '<	eot_id	>'
print_info: EOM token = 128008 '<	eom_id	>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 128008 '<	eom_id	>'
print_info: EOG token = 128009 '<	eot_id	>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: offloading 32 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 33/33 layers to GPU
load_tensors: CPU model buffer size = 281.81 MiB
load_tensors: SYCL0 model buffer size = 4403.49 MiB
llama_context: constructing llama_context
llama_context: n_seq_max = 2
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 2048
llama_context: n_batch = 1024
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: freq_base = 500000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Running with Environment Variables:
GGML_SYCL_DEBUG: 0
GGML_SYCL_DISABLE_OPT: 1
GGML_SYCL_DISABLE_GRAPH: 1
GGML_SYCL_PRIORITIZE_DMMV: 0
Build with Macros:
GGML_SYCL_FORCE_MMQ: no
GGML_SYCL_F16: no
Found 1 SYCL devices:
				Max		Max	Global
				compute	Max work	sub	mem
ID	Device Type	Name	Version	units	group	group	size	Driver version
--	-------------------	---------------------------------------	-------	-------	--------	-----	-------	---------------------
0	[level_zero:gpu:0]	Intel Arc A770 Graphics	12.55	512	1024	32	16225M	1.6.33944+12
SYCL Optimization Feature:
ID	Device Type	Reorder
--	-------------------	-------
0	[level_zero:gpu:0]	Y
llama_context: SYCL_Host output buffer size = 1.01 MiB
llama_kv_cache_unified: kv_size = 4096, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1, padding = 32
llama_kv_cache_unified: SYCL0 KV buffer size = 512.00 MiB
llama_kv_cache_unified: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB
llama_context: SYCL0 compute buffer size = 258.50 MiB
llama_context: SYCL_Host compute buffer size = 16.01 MiB
llama_context: graph nodes = 966 (with bs=512), 870 (with bs=1)
llama_context: graph splits = 2
time=2025-07-11T11:38:36.637+03:00 level=WARN source=runner.go:786 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
time=2025-07-11T11:38:36.970+03:00 level=INFO source=server.go:657 msg="llama runner started in 23.61 seconds"
[GIN] 2025/07/11 - 11:38:36	200	24.383788113s	127.0.0.1	POST "/api/generate"
The program was built for 1 devices
Build program log for 'Intel(R) Arc(TM) A770 Graphics':
Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:4514
time=2025-07-11T11:39:21.062+03:00 level=ERROR source=server.go:827 msg="post predict" error="Post "http://127.0.0.1:45907/completion": EOF"
[GIN] 2025/07/11 - 11:39:21	200	380.176741ms	127.0.0.1	POST "/api/chat"

ENV-CHECK.SH

env-check.sh log

(venv) root@xpu:/home/arc/Downloads# bash env-check.sh ----------------------------------------------------------------- PYTHON_VERSION=3.12.3 ----------------------------------------------------------------- Transformers is not installed. ----------------------------------------------------------------- PyTorch is not installed. ----------------------------------------------------------------- ipex-llm WARNING: Package(s) not found: ipex-llm ----------------------------------------------------------------- IPEX is not installed. ----------------------------------------------------------------- CPU Information: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 72 On-line CPU(s) list: 0-71 Vendor ID: GenuineIntel BIOS Vendor ID: Intel Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz BIOS Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz CPU @ 2.3GHz BIOS CPU family: 179 CPU family: 6 Model: 63 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 Stepping: 2 ----------------------------------------------------------------- Total CPU Memory: 125.752 GB Memory Type: DDR4 ----------------------------------------------------------------- Operating System: Ubuntu 24.04.2 LTS \n \l

Linux xpu 6.8.0-63-generic #66-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun 13 20:25:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

CLI: Version: 1.3.0.20250707 Build ID: 3db7de07

Service: Version: 1.3.0.20250707 Build ID: 3db7de07 Level Zero Version: 1.21.9

Driver UUID 32352e32-322e-3333-3934-340000000000 Driver Version 25.22.33944 Driver UUID 32352e32-322e-3333-3934-340000000000 Driver Version 25.22.33944 Driver UUID 32352e32-322e-3333-3934-340000000000 Driver Version 25.22.33944 Driver UUID 32352e32-322e-3333-3934-340000000000 Driver Version 25.22.33944

Driver related package version:

env-check.sh: line 167: sycl-ls: command not found igpu not detected

xpu-smi is properly installed.

+-----------+--------------------------------------------------------------------------------------+ | Device ID | Device Information | +-----------+--------------------------------------------------------------------------------------+ | 0 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0005-0000-000856a08086 | | | PCI BDF Address: 0000:05:00.0 | | | DRM Device: /dev/dri/card1 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 1 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0009-0000-000856a08086 | | | PCI BDF Address: 0000:09:00.0 | | | DRM Device: /dev/dri/card2 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 2 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0084-0000-000856a08086 | | | PCI BDF Address: 0000:84:00.0 | | | DRM Device: /dev/dri/card3 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 3 | Device Name: Intel(R) Arc(TM) A770 Graphics | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0088-0000-000856a08086 | | | PCI BDF Address: 0000:88:00.0 | | | DRM Device: /dev/dri/card4 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ GPU0 Memory size=16G GPU1 Memory size=16G GPU2 Memory size=16G GPU3 Memory size=16G

05:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: ASRock Incorporation DG2 [Arc A770] Flags: bus master, fast devsel, latency 0, IRQ 78, NUMA node 0, IOMMU group 68 Memory at 90000000 (64-bit, non-prefetchable) [size=16M] Memory at 38000000000 (64-bit, prefetchable) [size=16G] Expansion ROM at [disabled] Capabilities: [40] Vendor Specific Information: Len=0c <?> Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+

09:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: ASRock Incorporation DG2 [Arc A770] Flags: bus master, fast devsel, latency 0, IRQ 81, NUMA node 0, IOMMU group 73 Memory at c4000000 (64-bit, non-prefetchable) [size=16M] Memory at 3b800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at c5000000 [disabled] [size=2M] Capabilities: [40] Vendor Specific Information: Len=0c <?> Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+

84:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: ASRock Incorporation DG2 [Arc A770] Flags: bus master, fast devsel, latency 0, IRQ 85, NUMA node 2, IOMMU group 7 Memory at fa000000 (64-bit, non-prefetchable) [size=16M] Memory at 3f800000000 (64-bit, prefetchable) [size=16G] Expansion ROM at fb000000 [disabled] [size=2M] Capabilities: [40] Vendor Specific Information: Len=0c <?> Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+

88:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: ASRock Incorporation DG2 [Arc A770] Flags: bus master, fast devsel, latency 0, IRQ 90, NUMA node 2, IOMMU group 12 Memory at f8000000 (64-bit, non-prefetchable) [size=16M] Memory at 3f000000000 (64-bit, prefetchable) [size=16G] Expansion ROM at f9000000 [disabled] [size=2M] Capabilities: [40] Vendor Specific Information: Len=0c <?> Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+

Jul 11 '25 08:07 savvadesogle

https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/ollama-ipex-llm-2.3.0b20250710-ubuntu.tgz

If I don't specify the export SYCL_CACHE_PERSISTENT=1 variable, then ollama works.

is it normal speed for gemma3n:e4b (file_type=Q4_K_M) on A770?

total duration:       3.617709154s
load duration:        217.168292ms
prompt eval count:    10 token(s)
prompt eval duration: 359.475262ms
prompt eval rate:     27.82 tokens/s
eval count:           49 token(s)
eval duration:        3.039351201s
eval rate:            16.12 tokens/s

The power consumption is only 90-95 W for Gemma3n

At the same time, llama3.1:8b (q4_k_m) consumes 170W

Jul 11 '25 09:07 savvadesogle

@rnwang04 It's OK. Think you!

Jul 11 '25 10:07 yizhangliu

If I don't specify the export SYCL_CACHE_PERSISTENT=1 variable, then ollama works.

Yes, unset SYCL_CACHE_PERSISTENT can solve this issue, we will consider to update our quickstart & script later. 😊

is it normal speed for gemma3n:e4b (file_type=Q4_K_M) on A770?

At our side, we obtain similar speed with gemma3n:e4b as yours. The current unsatisfactory speed is due to the fact that we put the fp16 lm head of this model on CPU for calculation to solve the GPU oom error mentioned above, which affects the performance of the overall model. If there is any better solution, such as this model changes to use a lower precision lm head, we may try to optimize the speed.

Jul 14 '25 01:07 rnwang04

At our side, we obtain similar speed with gemma3n:e4b as yours.

on Windows via lm-studio (Vulkan) speed is 24.23 tok/sec (with default 4K context) And 72-74% of the gpu is used.

(https://huggingface.co/lmstudio-community/gemma-3n-E4B-it-text-GGUF/tree/main gemma-3n-E4B-it-Q4_K_M.gguf)

llama.cpp release b5849 (commit 6efcd65) v1.38.0 Vulkan llama.cpp (Windows)

Jul 14 '25 04:07 savvadesogle