ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

ollama ipex-llm crashes when running tinyllama llama 3.1b and qwen 2.5

Open funky-gibbon opened this issue 6 months ago • 10 comments

Note to Maintainers: I previously filed this bug as issue #13221. It was closed, but no comments or explanation were visible on the issue page, preventing me from understanding the resolution or next steps. I am re-filing this detailed report to ensure it is seen and addressed. I hope I have not failed to follow a necessary protocol of some kind. Please advise me if I have.

  1. Problem Description:

When attempting to run any model (e.g., tinyllama, llama3.1, qwen2.5:7b) using the ollama-ipex-llm optimized builds on an Intel Arc A770 discrete GPU, the llama runner process consistently terminates with a SYCL error during the ggml_sycl_op_mul_mat operation. intel_gpu_top shows blitter activity, indicating data transfer to the GPU, but no significant "Render/3D" or "Compute" utilization occurs before the crash. This issue persists across different IPEX-LLM build versions and various models.

  1. Steps to Reproduce:

Operating System: Clean installation of Ubuntu 24.04 LTS (Noble Numbat).

Kernel Version: 6.11.0-26-generic.

Intel GPU Driver Installation:

intel-opencl-icd was installed via sudo apt install intel-opencl-icd. Latest discrete GPU drivers (25.18.33578.6) were manually downloaded from Intel's website and installed. IPEX-LLM Ollama Setup (Primary Test: Build 2.3.0b20250612):

Downloaded ollama-ipex-llm-2.3.0b20250612-ubuntu.tgz. Extracted to ~/ollama_ipex/ollama-ipex-llm-2.3.0b20250612-ubuntu/. Created /usr/local/bin/start_ollama_ipex.sh wrapper script and /etc/systemd/system/ollama.service systemd unit file. The start_ollama_ipex.sh script included all export variables from the IPEX-LLM's start-ollama.sh (e.g., OLLAMA_NUM_GPU=999, ZES_ENABLE_SYSMAN=1, SYCL_CACHE_PERSISTENT=1, OLLAMA_KEEP_ALIVE=10m). SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 was tested (uncommented, then re-commented), but showed no change in behavior. The systemd service was configured to run as user michael-fothergill and point to /usr/local/bin/start_ollama_ipex.sh. Model Run Attempts (Build 2.3.0b20250612):

ollama pull tinyllama ollama run tinyllama "Hello, what is your purpose?" Result: SYCL error termination. ollama pull llama3.1 ollama run llama3.1 "Write a short, evocative poem..." Result: SYCL error termination. ollama pull qwen2.5:7b ollama run qwen2.5:7b "What is the capital of France?" Result: SYCL error termination. IPEX-LLM Ollama Setup (Secondary Test: Older Build 2.3.0b20250429):

Downloaded ollama-ipex-llm-2.3.0b20250429-ubuntu.tgz. Extracted to ~/ollama_ipex_older/ollama-ipex-llm-2.3.0b20250429-ubuntu/. Updated start_ollama_ipex.sh to point to the ollama binary in this older directory. SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 was kept commented out for this test. sudo systemctl daemon-reload && sudo systemctl restart ollama. Model Run Attempt (Older Build 2.3.0b20250429):

ollama run tinyllama "Hello, what is your purpose?" Result: SYCL error termination (same as newer build). 3. Expected Behavior:

The IPEX-LLM optimized Ollama should leverage the Intel Arc A770 discrete GPU for inference, resulting in high "Render/3D" or "Compute" utilization in intel_gpu_top and successful text generation.

  1. Actual Behavior:

The llama runner process consistently terminates with the following SYCL error during matrix multiplication (ggml_sycl_op_mul_mat), regardless of the model size (tinyllama, llama3.1, qwen2.5:7b) or IPEX-LLM build version tested:

Error: llama runner process has terminated: error:CHECK_TRY_ERROR(op(ctx, src0, src1, dst, src0_dd_i, src1_ddf_i, src1_ddq_i, dst_dd_i, dev[i].row_low, dev[i].row_high, src1_ncols, src1_padded_col_size, stream)): Meet error in this line code! in function ggml_sycl_op_mul_mat at /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3260 /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:117: SYCL error (Note: The line number may vary slightly, e.g., 3277 vs 3260, but the core error remains the same.)

intel_gpu_top shows blitter activity (data transfer) but no significant "Render/3D" or "Compute" engine utilization before the crash.

  1. System Information:

GPU: Intel Arc A770 (Desktop) Operating System: Ubuntu 24.04 LTS (Noble Numbat) Kernel Version: 6.11.0-26-generic Intel GPU Driver Version: 25.18.33578.6 (manually installed from Intel's website). Confirmed intel-opencl-icd also installed. sycl-ls Output: [level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A770 Graphics 12.55.8 [1.6.33578.600000] [opencl:cpu][opencl:0] Intel(R) OpenCL, AMD Ryzen 7 8700F 8-Core Processor OpenCL 3.0 (Build 0) [2025.19.4.0.18_160000.xmain-hotfix] IPEX-LLM Ollama Builds Tested: ollama-ipex-llm-2.3.0b20250612-ubuntu.tgz ollama-ipex-llm-2.3.0b20250429-ubuntu.tgz Relevant Environment Variables set in /usr/local/bin/start_ollama_ipex.sh (consistent across tests): Bash

export OLLAMA_NUM_GPU=999 export no_proxy=localhost,127.0.0.1 export ZES_ENABLE_SYSMAN=1 export SYCL_CACHE_PERSISTENT=1 export OLLAMA_

funky-gibbon avatar Jun 16 '25 13:06 funky-gibbon

Hey man I used to get this error before, so I think I can do sth to help. Btw can you send me your error logs?

Ellie-Williams-007 avatar Jun 17 '25 01:06 Ellie-Williams-007

Many thanks for taking the time to comment on the post. You are most kind. I have the error logs. They are from the ollama-ipex-llm optimised build not from previous non-IPEX-LLM attempts. Here is the output from 'journalctl -u ollama.service -n 50 --no-pager' capuring recent attempts:

base) michael-fothergill@michael-fothergill-B650-EAGLE-AX:~$ journalctl -u ollama.service -n 50 --no-pager Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: r14 0x16 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: r15 0x7e4ad9633ee4 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: rip 0x7e4ad809eb2c Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: rflags 0x246 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: cs 0x33 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: fs 0x0 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: gs 0x0 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: time=2025-06-14T12:45:06.871+01:00 level=ERROR source=server.go:474 msg="llama runner terminated" error="exit status 2" Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: time=2025-06-14T12:45:06.935+01:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:CHECK_TRY_ERROR(op(ctx, src0, src1, dst, src0_dd_i, src1_ddf_i, src1_ddq_i, dst_dd_i, dev[i].row_low, dev[i].row_high, src1_ncols, src1_padded_col_size, stream)): Meet error in this line code!\n in function ggml_sycl_op_mul_mat at /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3260\n/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:117: SYCL error" Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: [GIN] 2025/06/14 - 12:45:06 | 500 | 5.019591164s | 127.0.0.1 | POST "/api/generate" Jun 14 12:53:05 michael-fothergill-B650-EAGLE-AX systemd[1]: Stopping ollama.service - Ollama Service... Jun 14 12:53:05 michael-fothergill-B650-EAGLE-AX systemd[1]: ollama.service: Deactivated successfully. Jun 14 12:53:05 michael-fothergill-B650-EAGLE-AX systemd[1]: Stopped ollama.service - Ollama Service. Jun 14 12:53:05 michael-fothergill-B650-EAGLE-AX systemd[1]: ollama.service: Consumed 2min 4.165s CPU time, 5.7G memory peak, 0B memory swap peak. -- Boot b930c29b84bc4dcb868a6088c8a334a1 -- Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX systemd[1]: Started ollama.service - Ollama Service. Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: 2025/06/17 15:55:30 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/michael-fothergill/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.609+01:00 level=INFO source=images.go:432 msg="total blobs: 15" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.610+01:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: - using env: export GIN_MODE=release Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: - using code: gin.SetMode(gin.ReleaseMode) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.610+01:00 level=INFO source=routes.go:1297 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.611+01:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.618+01:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.618+01:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.0 GiB" available="29.7 GiB" (base) michael-fothergill@michael-fothergill-B650-EAGLE-AX:~$

A few things I've noticed from these logs that might be important: * You can see the llama runner process has terminated error during ggml_sycl_op_mul_mat (e.g., around Jun 14 12:45:06), which is the core issue I'm reporting. * Interestingly, in the more recent logs (Jun 17 15:55:30), it also states msg="no compatible GPUs were discovered". This is despite sycl-ls correctly showing my Intel Arc A770. It seems Ollama isn't detecting the GPU for offloading at startup. My system is an Intel Arc A770 on Ubuntu 24.04 LTS with driver version 25.18.33578.6. Let me know if you need any further information or specific diagnostic outputs.

Cheers

MF

funky-gibbon avatar Jun 17 '25 15:06 funky-gibbon

I made an error. I forgot to convert the logs into markdown format. Here is the corrected log output:

journalctl -u ollama.service -n 50 --no-pager Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: r14 0x16 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: r15 0x7e4ad9633ee4 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: rip 0x7e4ad809eb2c Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: rflags 0x246 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: cs 0x33 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: fs 0x0 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: gs 0x0 Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: time=2025-06-14T12:45:06.871+01:00 level=ERROR source=server.go:474 msg="llama runner terminated" error="exit status 2" Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: time=2025-06-14T12:45:06.935+01:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:CHECK_TRY_ERROR(op(ctx, src0, src1, dst, src0_dd_i, src1_ddf_i, src1_ddq_i, dst_dd_i, dev[i].row_low, dev[i].row_high, src1_ncols, src1_padded_col_size, stream)): Meet error in this line code!\n in function ggml_sycl_op_mul_mat at /home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3260\n/home/runner/_work/llm.cpp/llm.cpp/ollama-llama-cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:117: SYCL error" Jun 14 12:45:06 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[7900]: [GIN] 2025/06/14 - 12:45:06 | 500 | 5.019591164s | 127.0.0.1 | POST "/api/generate" Jun 14 12:53:05 michael-fothergill-B650-EAGLE-AX systemd[1]: Stopping ollama.service - Ollama Service... Jun 14 12:53:05 michael-fothergill-B650-EAGLE-AX systemd[1]: ollama.service: Deactivated successfully. Jun 14 12:53:05 michael-fothergill-B650-EAGLE-AX systemd[1]: Stopped ollama.service - Ollama Service. Jun 14 12:53:05 michael-fothergill-B650-EAGLE-AX systemd[1]: ollama.service: Consumed 2min 4.165s CPU time, 5.7G memory peak, 0B memory swap peak. -- Boot b930c29b84bc4dcb868a6088c8a334a1 -- Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX systemd[1]: Started ollama.service - Ollama Service. Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: 2025/06/17 15:55:30 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:10m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/michael-fothergill/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* [suspicious link removed] [suspicious link removed] [suspicious link removed]:* [suspicious link removed]:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.609+01:00 level=INFO source=images.go:432 msg="total blobs: 15" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.610+01:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: - using env: export GIN_MODE=release Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: - using code: gin.SetMode(gin.ReleaseMode) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.610+01:00 level=INFO source=routes.go:1297 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.611+01:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.618+01:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" Jun 17 15:55:30 michael-fothergill-B650-EAGLE-AX start_ollama_ipex.sh[2834]: time=2025-06-17T15:55:30.618+01:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.0 GiB" available="29.7 GiB"

Cheers

MF

funky-gibbon avatar Jun 17 '25 15:06 funky-gibbon

I think it's caused by AMD cpu, they only support intel cpu (I mentioned it to the intel team before). Also msg="no compatible GPUs were discovered" is not relevant, just a useless log.

Ellie-Williams-007 avatar Jun 18 '25 01:06 Ellie-Williams-007

That sounds odd to me. Are there any people you know in the intel team I could email and ask about that personally or a user group I could join run by Intel that I could politely ask that question to? You make it sound like I would be better buying an AMD graphics card and not bothering with an Intel GPU. I want to make the Intel A 770 GPU work if I can. Suggestions most welcome. Cheers MF

funky-gibbon avatar Jun 18 '25 10:06 funky-gibbon

That sounds odd to me. Are there any people you know in the intel team I could email and ask about that personally or a user group I could join run by Intel that I could politely ask that question to? You make it sound like I would be better buying an AMD graphics card and not bothering with an Intel GPU. I want to make the Intel A 770 GPU work if I can. Suggestions most welcome. Cheers MF

Yep, I'm also trying get it to work. Sadly no working ollama here atm...

Mailootje avatar Jun 20 '25 00:06 Mailootje

For anyone interested, here is my log: https://pastebin.com/UkHVNSKu

Mailootje avatar Jun 20 '25 00:06 Mailootje

I talked to the intel team, amd cpu should work. This log is confused, and if you can send me a readable log, which saved by ollama directly but not journalctl, that would be great.

Ellie-Williams-007 avatar Jun 20 '25 01:06 Ellie-Williams-007

We have fixed the bug now. We removed the existing installation and reinstalled again following Intel's graphics installation guide,. See here:

Title: Update: Issue #13223 - Resolved - Successful Intel Arc A770 GPU Utilization with IPEX-LLM Ollama

Status: Resolved

Problem Statement (Original Issue #13223): The original bug report, #13223, detailed persistent SYCL errors and crashes during ggml_sycl_op_mul_mat when attempting to run various LLM models (e.g., tinyllama, llama3.1, qwen2.5:7b) using IPEX-LLM optimized Ollama builds on an Intel Arc A770 discrete GPU on Ubuntu 24.04 LTS. intel_gpu_top showed blitter activity but no significant "Render/3D" or "Compute" utilization before the crashes.

Resolution: The issue has been successfully resolved, and the system is now operating as expected with the Intel Arc A770 GPU actively accelerating Ollama inference. The resolution was achieved by meticulously re-following the official Intel GPU installation instructions for IPEX-LLM and Ollama, ensuring all dependencies and environment configurations were correctly applied.

Actual Behavior (Original - now resolved): Previously, the llama runner process consistently terminated with SYCL errors during matrix multiplication (ggml_sycl_op_mul_mat), regardless of the model size or IPEX-LLM build version tested. This crash no longer occurs.

Key Observations Confirming Resolution:

Successful SYCL Backend Loading & GPU Detection: Logs (e.g., journalctl -fu ollama) now consistently show the SYCL backend being loaded and the Intel Arc A770 Graphics card being fully detected with its allocated VRAM:

load_backend: loaded SYCL backend from /home/michael-fothergill/ollama_ipex/ollama-ipex-llm-2.3.0b20250612-ubuntu/libggml-sycl.so | 0| [level_zero:gpu:0]| Intel Arc A770 Graphics| 12.55| 512| 1024| 32| 16225M| 1.6.33578+11| | 0| [level_zero:gpu:0]| Y|

Full Model Layer Offloading to GPU: Crucially, during model loading and inference, the logs now confirm that all layers of the model are being successfully offloaded to the GPU:

level=INFO source=server.go:154 msg=offload library=oneapi layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[14.6 GiB]" memory.required.full="6.0 GiB" ...

This indicates that the llama3.1 model (with 33 layers) is entirely resident and processed on the GPU.

High GPU Compute Utilization During Inference: Monitoring with sudo intel_gpu_top during active inference (e.g., ollama run llama3.1 "Tell me a story...") now consistently shows high utilization of the GPU's "Compute" engine, while "Render/3D" remains low, which is the expected behavior for an LLM workload:

Compute: ~90%

Render/3D: ~1-2% This auditory observation of the PC's fans becoming busier during inference also correlated with high GPU activity.

Successful Tokens Per Second (TPS) Reporting: The ollama run command now correctly reports the inference speed in tokens per second at the end of the output, confirming effective performance. For instance, a test with llama3.1 yielded:

llama3.1:latest 21.05 tokens/s Done in 2.92s

(Note: For very short responses, this metric might occasionally flash too quickly to be visually perceived in the terminal, but it is indeed calculated and present in the output stream).

System Information (Now working as expected):

GPU: Intel Arc A770 (Desktop)

Operating System: Ubuntu 24.04 LTS (Noble Numbat)

Kernel Version: 6.11.0-26-generic

Intel GPU Driver Version: 25.18.33578.6 (manually installed from Intel's website). Confirmed intel-opencl-icd also installed.

sycl-ls Output (confirming Level-Zero and OpenCL device detection):

[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A770 Graphics 12.55.8 [1.6.33578.600000] [opencl:cpu][opencl:0] Intel(R) OpenCL, AMD Ryzen 7 8700F 8-Core Processor OpenCL 3.0 (Build 0) [2025.19.4.0.18_160000.xmain-hotfix]

IPEX-LLM Ollama Build Confirmed Working: ollama-ipex-llm-2.3.0b20250612-ubuntu.tgz

Relevant Environment Variables (set in /usr/local/bin/start_ollama_ipex.sh and successfully utilized):

export OLLAMA_NUM_GPU=999 export no_proxy=localhost,127.0.0.1 export ZES_ENABLE_SYSMAN=1 export SYCL_CACHE_PERSISTENT=1 export OLLAMA_KEEP_ALIVE=10m

(Note: SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 was tested with no change in the current working state.)

Recommendation: This issue can now be closed as resolved. The detailed steps and observations confirm that the Intel Arc A770 GPU is functioning correctly with the IPEX-LLM Ollama setup, providing significant acceleration for LLM inference.

funky-gibbon avatar Jun 20 '25 15:06 funky-gibbon

PS Although the GPU is working now I am still seeing a lot of CPU usage as well. But I will do some more work on the apportionment of data put into the context window etc vs VRAM used by the LLM to process prompts etc. I don't know enough about that yet. I am aiming for a resftul CPU and a busy GPU when using the LLMs. Suggestions on this are welcome.

funky-gibbon avatar Jun 20 '25 17:06 funky-gibbon