ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Abnormalities in using Intel GPU SYCL on the Ubuntu system

Open ligjn opened this issue 3 months ago • 4 comments

My Device: system: ubuntu 24.03 ipex-llm: ipex-llm-ollama

Image Image

Error performance: An error occurs when executing ollama run qwen2.5:0.5b Image

A more complete error log: The current error message is displayed in the terminal command-line interface of ollama serve.

[GIN] 2025/09/15 - 19:29:14 | 500 |  442.334217ms |       127.0.0.1 | POST     "/api/generate"
^Croot@dcg:/opt/aog/engine/ollama/ollama# ./ollama serve
time=2025-09-15T19:29:45.094+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:16677 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/aog/engine/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-09-15T19:29:45.094+08:00 level=INFO source=images.go:476 msg="total blobs: 7"
time=2025-09-15T19:29:45.094+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-09-15T19:29:45.094+08:00 level=INFO source=routes.go:1288 msg="Listening on [::]:16677 (version 0.9.3)"
time=2025-09-15T19:29:45.094+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-09-15T19:29:45.094+08:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
time=2025-09-15T19:29:45.100+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="30.9 GiB" available="25.8 GiB"
[GIN] 2025/09/15 - 19:29:48 | 200 |      27.735µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/09/15 - 19:29:48 | 200 |   28.881689ms |       127.0.0.1 | POST     "/api/show"
time=2025-09-15T19:29:48.779+08:00 level=INFO source=server.go:135 msg="system memory" total="30.9 GiB" free="25.8 GiB" free_swap="0 B"
time=2025-09-15T19:29:48.780+08:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=25 layers.offload=0 layers.split="" memory.available="[25.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="732.6 MiB" memory.required.partial="0 B" memory.required.kv="48.0 MiB" memory.required.allocations="[732.6 MiB]" memory.weights.total="373.7 MiB" memory.weights.repeating="235.8 MiB" memory.weights.nonrepeating="137.9 MiB" memory.graph.full="298.5 MiB" memory.graph.partial="405.0 MiB"
llama_model_loader: loaded meta data with 34 key-value pairs and 290 tensors from /var/lib/aog/engine/ollama/models/blobs/sha256-c5396e06af294bd101b30dce59131a76d2b773e76950acc870eda801d3ab0515 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 0.5B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 0.5B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-0...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 0.5B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-0.5B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 24
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_0:  132 tensors
llama_model_loader: - type q8_0:   13 tensors
llama_model_loader: - type q4_K:   12 tensors
llama_model_loader: - type q6_K:   12 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 373.71 MiB (6.35 BPW) 
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 494.03 M
print_info: general.name     = Qwen2.5 0.5B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 151936
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-09-15T19:29:48.917+08:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/opt/aog/engine/ollama/ollama/ollama-bin runner --model /var/lib/aog/engine/ollama/models/blobs/sha256-c5396e06af294bd101b30dce59131a76d2b773e76950acc870eda801d3ab0515 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 8 --no-mmap --parallel 2 --port 42509"
time=2025-09-15T19:29:48.917+08:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-09-15T19:29:48.917+08:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-09-15T19:29:48.917+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
using override patterns: []
time=2025-09-15T19:29:48.949+08:00 level=INFO source=runner.go:851 msg="starting go runner"
terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html
SIGABRT: abort
PC=0x7f96256a70fc m=0 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 1 gp=0xc000002380 m=0 mp=0x1ecf7c0 [syscall]:
runtime.cgocall(0x1157b90, 0xc000439538)
        /usr/local/go/src/runtime/cgocall.go:167 +0x4b fp=0xc000439510 sp=0xc0004394d8 pc=0x48398b
github.com/ollama/ollama/ml/backend/ggml/ggml/src._Cfunc_ggml_backend_load_all_from_path(0x26b6a910)
        _cgo_gotypes.go:195 +0x3a fp=0xc000439538 sp=0xc000439510 pc=0x830dfa
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.func1.1({0xc000042014, 0x1d})
        /home/arda/ruonan/ollama-internal/ml/backend/ggml/ggml/src/ggml.go:97 +0xf5 fp=0xc0004395d0 sp=0xc000439538 pc=0x830895
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.func1()
        /home/arda/ruonan/ollama-internal/ml/backend/ggml/ggml/src/ggml.go:98 +0x526 fp=0xc000439860 sp=0xc0004395d0 pc=0x8306e6
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.OnceFunc.func2()
        /usr/local/go/src/sync/oncefunc.go:27 +0x62 fp=0xc0004398a8 sp=0xc000439860 pc=0x8300e2
sync.(*Once).doSlow(0x0?, 0x0?)
        /usr/local/go/src/sync/once.go:78 +0xab fp=0xc000439900 sp=0xc0004398a8 pc=0x4991ab
sync.(*Once).Do(0x0?, 0x0?)
        /usr/local/go/src/sync/once.go:69 +0x19 fp=0xc000439920 sp=0xc000439900 pc=0x4990d9
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.OnceFunc.func3()
        /usr/local/go/src/sync/oncefunc.go:32 +0x2d fp=0xc000439950 sp=0xc000439920 pc=0x83004d
github.com/ollama/ollama/llama.BackendInit()
        /home/arda/ruonan/ollama-internal/llama/llama.go:57 +0x16 fp=0xc000439960 sp=0xc000439950 pc=0x8349f6
github.com/ollama/ollama/runner/llamarunner.Execute({0xc0001aa020, 0xf, 0x10})
        /home/arda/ruonan/ollama-internal/runner/llamarunner/runner.go:853 +0x7d4 fp=0xc000439d08 sp=0xc000439960 pc=0x8f3bf4
github.com/ollama/ollama/runner.Execute({0xc0001aa010?, 0x0?, 0x0?})
        /home/arda/ruonan/ollama-internal/runner/runner.go:22 +0xd4 fp=0xc000439d30 sp=0xc000439d08 pc=0x979374
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000271400?, {0x140da22?, 0x4?, 0x140da26?})
        /home/arda/ruonan/ollama-internal/cmd/cmd.go:1529 +0x45 fp=0xc000439d58 sp=0xc000439d30 pc=0x10d5b45
github.com/spf13/cobra.(*Command).execute(0xc000114f08, {0xc0004b4ff0, 0xf, 0xf})
        /home/arda/go/pkg/mod/github.com/spf13/[email protected]/command.go:940 +0x894 fp=0xc000439e78 sp=0xc000439d58 pc=0x5ff694
github.com/spf13/cobra.(*Command).ExecuteC(0xc0000e6908)
        /home/arda/go/pkg/mod/github.com/spf13/[email protected]/command.go:1068 +0x3a5 fp=0xc000439f30 sp=0xc000439e78 pc=0x5ffee5
github.com/spf13/cobra.(*Command).Execute(...)
        /home/arda/go/pkg/mod/github.com/spf13/[email protected]/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /home/arda/go/pkg/mod/github.com/spf13/[email protected]/command.go:985
main.main()
        /home/arda/ruonan/ollama-internal/main.go:12 +0x4d fp=0xc000439f50 sp=0xc000439f30 pc=0x10d65cd
runtime.main()
        /usr/local/go/src/runtime/proc.go:283 +0x28b fp=0xc000439fe0 sp=0xc000439f50 pc=0x45390b
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000439fe8 sp=0xc000439fe0 pc=0x48eca1

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000072fa8 sp=0xc000072f88 pc=0x486e0e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:441
runtime.forcegchelper()
        /usr/local/go/src/runtime/proc.go:348 +0xb3 fp=0xc000072fe0 sp=0xc000072fa8 pc=0x453c53
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000072fe8 sp=0xc000072fe0 pc=0x48eca1
created by runtime.init.7 in goroutine 1
        /usr/local/go/src/runtime/proc.go:336 +0x1a

goroutine 18 gp=0xc0000aa380 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00006e780 sp=0xc00006e760 pc=0x486e0e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:441
runtime.bgsweep(0xc0000b8000)
        /usr/local/go/src/runtime/mgcsweep.go:316 +0xdf fp=0xc00006e7c8 sp=0xc00006e780 pc=0x43e45f
runtime.gcenable.gowrap1()
        /usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc00006e7e0 sp=0xc00006e7c8 pc=0x4328c5
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x48eca1
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:204 +0x66

goroutine 19 gp=0xc0000aa540 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x15d2cc8?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00006ef78 sp=0xc00006ef58 pc=0x486e0e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x1ecc9a0)
        /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00006efa8 sp=0xc00006ef78 pc=0x43bea9
runtime.bgscavenge(0xc0000b8000)
        /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc00006efc8 sp=0xc00006efa8 pc=0x43c439
runtime.gcenable.gowrap2()
        /usr/local/go/src/runtime/mgc.go:205 +0x25 fp=0xc00006efe0 sp=0xc00006efc8 pc=0x432865
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x48eca1
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:205 +0xa5

goroutine 34 gp=0xc000184380 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000072688?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000072630 sp=0xc000072610 pc=0x486e0e
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:196 +0x107 fp=0xc0000727e0 sp=0xc000072630 pc=0x431887
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x48eca1
created by runtime.createfing in goroutine 1
        /usr/local/go/src/runtime/mfinal.go:166 +0x3d

goroutine 35 gp=0xc000184e00 m=nil [chan receive]:
runtime.gopark(0xc0001f5a40?, 0xc000116018?, 0x60?, 0x47?, 0x56cac8?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000304718 sp=0xc0003046f8 pc=0x486e0e
runtime.chanrecv(0xc000180310, 0x0, 0x1)
        /usr/local/go/src/runtime/chan.go:664 +0x445 fp=0xc000304790 sp=0xc000304718 pc=0x4232a5
runtime.chanrecv1(0x0?, 0x0?)
        /usr/local/go/src/runtime/chan.go:506 +0x12 fp=0xc0003047b8 sp=0xc000304790 pc=0x422e32
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        /usr/local/go/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1799 +0x2f fp=0xc0003047e0 sp=0xc0003047b8 pc=0x435a0f
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003047e8 sp=0xc0003047e0 pc=0x48eca1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1794 +0x79

goroutine 36 gp=0xc000185180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000304f38 sp=0xc000304f18 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000304fc8 sp=0xc000304f38 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000304fe0 sp=0xc000304fc8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000304fe8 sp=0xc000304fe0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc0000aa700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00006f738 sp=0xc00006f718 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00006f7c8 sp=0xc00006f738 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00006f7e0 sp=0xc00006f7c8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc000185340 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a908c21f?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000305738 sp=0xc000305718 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0003057c8 sp=0xc000305738 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0003057e0 sp=0xc0003057c8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003057e8 sp=0xc0003057e0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a9090238?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000300738 sp=0xc000300718 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0003007c8 sp=0xc000300738 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0003007e0 sp=0xc0003007c8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003007e8 sp=0xc0003007e0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 51 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a9066e73?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000300f38 sp=0xc000300f18 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000300fc8 sp=0xc000300f38 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000300fe0 sp=0xc000300fc8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000300fe8 sp=0xc000300fe0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 52 gp=0xc000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a9088da0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000301738 sp=0xc000301718 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0003017c8 sp=0xc000301738 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0003017e0 sp=0xc0003017c8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003017e8 sp=0xc0003017e0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 53 gp=0xc0001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a9087c16?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000301f38 sp=0xc000301f18 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000301fc8 sp=0xc000301f38 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000301fe0 sp=0xc000301fc8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000301fe8 sp=0xc000301fe0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc000185500 m=nil [GC worker (idle)]:
runtime.gopark(0x1f7a980?, 0x1?, 0x8c?, 0xb6?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000305f38 sp=0xc000305f18 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000305fc8 sp=0xc000305f38 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000305fe0 sp=0xc000305fc8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000305fe8 sp=0xc000305fe0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

rax    0x0
rbx    0xa4c6
rcx    0x7f96256a70fc
rdx    0x6
rdi    0xa4c6
rsi    0xa4c6
rbp    0x7f9626d33480
rsp    0x7ffd1e2d4bf0
r8     0x0
r9     0x7ffd1e2d4780
r10    0x8
r11    0x246
r12    0x26b735b0
r13    0x6
r14    0x0
r15    0x7f9612f38790
rip    0x7f96256a70fc
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
time=2025-09-15T19:29:49.167+08:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/09/15 - 19:29:49 | 500 |   432.59435ms |       127.0.0.1 | POST     "/api/generate"

ligjn avatar Sep 23 '25 01:09 ligjn

Check your driver bro

Ellie-Williams-007 avatar Sep 24 '25 14:09 Ellie-Williams-007

Thank you for your reply

The Ubuntu 24.03 Desktop System comes with the Intel xe driver by default, but it seems that intel_gpu_top only works for i915. So, what I'm confused about here is: how can I determine if my driver is working properly for the operation of ipex-llm-ollama? Is the pre-installed xe driver in the system unavailable?

Image

Here is a complete screenshot of the device information:

Image

ligjn avatar Sep 25 '25 02:09 ligjn

Same issue here with dual intel arc b50 pros, some models work some models throw this error intermittently

stearnsbq avatar Oct 03 '25 22:10 stearnsbq

same issue with intel arc b50 and rtx a2000. sometimes it will work but only on the rtx a2000

RetroZelda avatar Oct 04 '25 20:10 RetroZelda