Intel UHD Graphics 620 supported?
Is this GPU supported? Ipex-llm is working if i use ollama run directly. But as soon i feed it with context it's crashing...
| | | | |Max | |Max |Global | |
| | | | |compute|Max work|sub |mem | |
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]| Intel UHD Graphics 620| 9.1| 24| 256| 32| 15030M| 1.5.30872.220000|
llama_kv_cache_init: SYCL0 KV buffer size = 1012.00 MiB
llama_new_context_with_model: KV self size = 1012.00 MiB, K (f16): 506.00 MiB, V (f16): 506.00 MiB
llama_new_context_with_model: SYCL_Host output buffer size = 0.50 MiB
llama_new_context_with_model: SYCL0 compute buffer size = 258.50 MiB
llama_new_context_with_model: SYCL_Host compute buffer size = 23.82 MiB
llama_new_context_with_model: graph nodes = 902
llama_new_context_with_model: graph splits = 2
time=2025-04-08T05:04:51.465+08:00 level=WARN source=runner.go:892 msg="%s: warming up the model with an empty run - please wait ... " !BADKEY=loadModel
time=2025-04-08T05:04:51.596+08:00 level=INFO source=server.go:610 msg="llama runner started in 15.06 seconds"
get_memory_info: [warning] ext_intel_free_memory is not supported (export/set ZES_ENABLE_SYSMAN=1 to support), use total memory as free memory
llama_load_model_from_file: using device SYCL0 (Intel(R) UHD Graphics 620) - 14334 MiB free
llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
llama_model_loader: - kv 5: general.size_label str = 8B
llama_model_loader: - kv 6: general.license str = llama3.1
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: llama.block_count u32 = 32
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 4096
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: general.file_type u32 = 15
llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 28: general.quantization_version u32 = 2
llama_model_loader: - type f32: 66 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 1
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = all F32
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.58 GiB (4.89 BPW)
llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llama_model_load: vocab only - skipping tensors
ollama-lib: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:439: auto ggml_sycl_op_sdp_xmx_casual(fp16 *, fp16 *, fp16 *, fp16 *, fp16 *, float *, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, size_t, float *, float, sycl::queue &)::(anonymous class)::operator()() const: Assertion `false' failed.
SIGABRT: abort
PC=0x7f0f340419fc m=10 sigcode=18446744073709551610
signal arrived during cgo execution
goroutine 36 gp=0xc000230a80 m=10 mp=0xc000604708 [syscall]:
runtime.cgocall(0x55b174a5a260, 0xc000099b90)
runtime/cgocall.go:167 +0x4b fp=0xc000099b68 sp=0xc000099b30 pc=0x55b173eb7feb
ollama/llama/llamafile._Cfunc_llama_decode(0x7f0ecbc94bc0, {0x200, 0x7f0ec8094590, 0x0, 0x0, 0x7f0ec8094da0, 0x7f0ec8059e90, 0x7f0ec80484b0, 0x7f0ec81c46a0})
_cgo_gotypes.go:557 +0x4f fp=0xc000099b90 sp=0xc000099b68 pc=0x55b17427a40f
ollama/llama/llamafile.(*Context).Decode.func1(0x55b17428918b?, 0x7f0ecbc94bc0?)
ollama/llama/llamafile/llama.go:143 +0xf5 fp=0xc000099c80 sp=0xc000099b90 pc=0x55b17427d035
ollama/llama/llamafile.(*Context).Decode(0xc000099d70?, 0x0?)
ollama/llama/llamafile/llama.go:143 +0x13 fp=0xc000099cc8 sp=0xc000099c80 pc=0x55b17427ceb3
ollama/llama/runner.(*Server).processBatch(0xc00051a120, 0xc0003d04e0, 0xc000099f20)
ollama/llama/runner/runner.go:434 +0x23f fp=0xc000099ee0 sp=0xc000099cc8 pc=0x55b174287e5f
ollama/llama/runner.(*Server).run(0xc00051a120, {0x55b17500b1d0, 0xc000708690})
ollama/llama/runner/runner.go:342 +0x1d5 fp=0xc000099fb8 sp=0xc000099ee0 pc=0x55b174287895
ollama/llama/runner.Execute.gowrap2()
ollama/llama/runner/runner.go:1006 +0x28 fp=0xc000099fe0 sp=0xc000099fb8 pc=0x55b17428cb08
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000099fe8 sp=0xc000099fe0 pc=0x55b173ec6ac1
created by ollama/llama/runner.Execute in goroutine 1
ollama/llama/runner/runner.go:1006 +0xde5
goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc0003f5560 sp=0xc0003f5540 pc=0x55b173ebe6ee
runtime.netpollblock(0xc0005155b0?, 0x73e55506?, 0xb1?)
runtime/netpoll.go:575 +0xf7 fp=0xc0003f5598 sp=0xc0003f5560 pc=0x55b173e82357
internal/poll.runtime_pollWait(0x7f0f347c6680, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc0003f55b8 sp=0xc0003f5598 pc=0x55b173ebd9e5
internal/poll.(*pollDesc).wait(0xc000476280?, 0x900000036?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0003f55e0 sp=0xc0003f55b8 pc=0x55b173f45007
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000476280)
internal/poll/fd_unix.go:620 +0x295 fp=0xc0003f5688 sp=0xc0003f55e0 pc=0x55b173f4a3d5
net.(*netFD).accept(0xc000476280)
net/fd_unix.go:172 +0x29 fp=0xc0003f5740 sp=0xc0003f5688 pc=0x55b173fb2aa9
net.(*TCPListener).accept(0xc00071ca00)
net/tcpsock_posix.go:159 +0x1e fp=0xc0003f5790 sp=0xc0003f5740 pc=0x55b173fc871e
net.(*TCPListener).Accept(0xc00071ca00)
net/tcpsock.go:372 +0x30 fp=0xc0003f57c0 sp=0xc0003f5790 pc=0x55b173fc75d0
net/http.(*onceCloseListener).Accept(0xc00051ae10?)
<autogenerated>:1 +0x24 fp=0xc0003f57d8 sp=0xc0003f57c0 pc=0x55b174240d24
net/http.(*Server).Serve(0xc000722870, {0x55b175008ee0, 0xc00071ca00})
net/http/server.go:3330 +0x30c fp=0xc0003f5908 sp=0xc0003f57d8 pc=0x55b174218cac
ollama/llama/runner.Execute({0xc000036130?, 0x0?, 0x0?})
ollama/llama/runner/runner.go:1027 +0x11a9 fp=0xc0003f5ca8 sp=0xc0003f5908 pc=0x55b17428c7e9
ollama/cmd.NewCLI.func2(0xc00051c400?, {0x55b174a5ed1d?, 0x4?, 0x55b174a5ed21?})
ollama/cmd/cmd.go:1430 +0x45 fp=0xc0003f5cd0 sp=0xc0003f5ca8 pc=0x55b174a594e5
github.com/spf13/cobra.(*Command).execute(0xc000518008, {0xc0007221e0, 0xf, 0xf})
github.com/spf13/[email protected]/command.go:985 +0xaaa fp=0xc0003f5e58 sp=0xc0003f5cd0 pc=0x55b17404be8a
github.com/spf13/cobra.(*Command).ExecuteC(0xc000683208)
github.com/spf13/[email protected]/command.go:1117 +0x3ff fp=0xc0003f5f30 sp=0xc0003f5e58 pc=0x55b17404c75f
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/[email protected]/command.go:1041
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/[email protected]/command.go:1034
main.main()
ollama/main.go:12 +0x4d fp=0xc0003f5f50 sp=0xc0003f5f30 pc=0x55b174a59b4d
runtime.main()
runtime/proc.go:272 +0x29d fp=0xc0003f5fe0 sp=0xc0003f5f50 pc=0x55b173e899fd
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0003f5fe8 sp=0xc0003f5fe0 pc=0x55b173ec6ac1
goroutine 2 gp=0xc000007340 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x55b173ebe6ee
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.forcegchelper()
runtime/proc.go:337 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x55b173e89d38
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x55b173ec6ac1
created by runtime.init.7 in goroutine 1
runtime/proc.go:325 +0x1a
goroutine 3 gp=0xc000007500 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x55b173ebe6ee
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.bgsweep(0xc0000ba000)
runtime/mgcsweep.go:317 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x55b173e743df
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x55b173e68a25
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x55b173ec6ac1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66
goroutine 4 gp=0xc0000076c0 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x55b174c04ed8?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x55b173ebe6ee
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.(*scavengerState).park(0x55b1757a2da0)
runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x55b173e71da9
runtime.bgscavenge(0xc0000ba000)
runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x55b173e72339
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x55b173e689c5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x55b173ec6ac1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5
goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]:
runtime.gopark(0xc000084648?, 0x55b173e5ef25?, 0xb0?, 0x1?, 0xc0000061c0?)
runtime/proc.go:424 +0xce fp=0xc000084620 sp=0xc000084600 pc=0x55b173ebe6ee
runtime.runfinq()
runtime/mfinal.go:193 +0x107 fp=0xc0000847e0 sp=0xc000084620 pc=0x55b173e67aa7
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x55b173ec6ac1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:163 +0x3d
goroutine 6 gp=0xc000180e00 m=nil [chan receive]:
runtime.gopark(0xc000086760?, 0x55b173f9a125?, 0x40?, 0xe8?, 0x55b17501c400?)
runtime/proc.go:424 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x55b173ebe6ee
runtime.chanrecv(0xc00004a460, 0x0, 0x1)
runtime/chan.go:639 +0x41c fp=0xc000086790 sp=0xc000086718 pc=0x55b173e5811c
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:489 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x55b173e57cd2
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1784 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x55b173e6ba8f
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x55b173ec6ac1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1779 +0x96
goroutine 7 gp=0xc000181880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x55b173ebe6ee
runtime.gcBgMarkWorker(0xc00004ba40)
runtime/mgc.go:1412 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x55b173e6ad89
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x55b173e6ac65
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x55b173ec6ac1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105
goroutine 18 gp=0xc000230380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x55b173ebe6ee
runtime.gcBgMarkWorker(0xc00004ba40)
runtime/mgc.go:1412 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x55b173e6ad89
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x55b173e6ac65
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x55b173ec6ac1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105
goroutine 34 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x55b1757cc920?, 0x1?, 0x5?, 0x2a?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc00050a738 sp=0xc00050a718 pc=0x55b173ebe6ee
runtime.gcBgMarkWorker(0xc00004ba40)
runtime/mgc.go:1412 +0xe9 fp=0xc00050a7c8 sp=0xc00050a738 pc=0x55b173e6ad89
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc00050a7e0 sp=0xc00050a7c8 pc=0x55b173e6ac65
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00050a7e8 sp=0xc00050a7e0 pc=0x55b173ec6ac1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105
goroutine 19 gp=0xc000230540 m=nil [GC worker (idle)]:
runtime.gopark(0xdea45b9ea56?, 0x3?, 0x3b?, 0xde?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x55b173ebe6ee
runtime.gcBgMarkWorker(0xc00004ba40)
runtime/mgc.go:1412 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x55b173e6ad89
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x55b173e6ac65
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x55b173ec6ac1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105
goroutine 20 gp=0xc000230700 m=nil [GC worker (idle)]:
runtime.gopark(0xdea45b9c523?, 0x3?, 0x61?, 0xc2?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x55b173ebe6ee
runtime.gcBgMarkWorker(0xc00004ba40)
runtime/mgc.go:1412 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x55b173e6ad89
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x55b173e6ac65
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x55b173ec6ac1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105
goroutine 8 gp=0xc000181a40 m=nil [GC worker (idle)]:
runtime.gopark(0xdea45b2cac6?, 0x1?, 0xed?, 0x89?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x55b173ebe6ee
runtime.gcBgMarkWorker(0xc00004ba40)
runtime/mgc.go:1412 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x55b173e6ad89
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x55b173e6ac65
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x55b173ec6ac1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105
goroutine 9 gp=0xc000181c00 m=nil [GC worker (idle)]:
runtime.gopark(0xdea45b1d34c?, 0x1?, 0xb1?, 0x40?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x55b173ebe6ee
runtime.gcBgMarkWorker(0xc00004ba40)
runtime/mgc.go:1412 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x55b173e6ad89
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x55b173e6ac65
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x55b173ec6ac1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105
goroutine 10 gp=0xc000181dc0 m=nil [GC worker (idle)]:
runtime.gopark(0xdea45b1cff0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000506738 sp=0xc000506718 pc=0x55b173ebe6ee
runtime.gcBgMarkWorker(0xc00004ba40)
runtime/mgc.go:1412 +0xe9 fp=0xc0005067c8 sp=0xc000506738 pc=0x55b173e6ad89
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1328 +0x25 fp=0xc0005067e0 sp=0xc0005067c8 pc=0x55b173e6ac65
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0005067e8 sp=0xc0005067e0 pc=0x55b173ec6ac1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1328 +0x105
goroutine 91 gp=0xc000504540 m=nil [select]:
runtime.gopark(0xc00005da68?, 0x2?, 0x6e?, 0x41?, 0xc00005d834?)
runtime/proc.go:424 +0xce fp=0xc00005d650 sp=0xc00005d630 pc=0x55b173ebe6ee
runtime.selectgo(0xc00005da68, 0xc00005d830, 0xa6e?, 0x0, 0x1?, 0x1)
runtime/select.go:335 +0x7a5 fp=0xc00005d778 sp=0xc00005d650 pc=0x55b173e9b9e5
ollama/llama/runner.(*Server).completion(0xc00051a120, {0x55b1750090f0, 0xc00012b500}, 0xc000329cc0)
ollama/llama/runner/runner.go:696 +0xab6 fp=0xc00005dac0 sp=0xc00005d778 pc=0x55b174289cd6
ollama/llama/runner.(*Server).completion-fm({0x55b1750090f0?, 0xc00012b500?}, 0x55b174222a87?)
<autogenerated>:1 +0x36 fp=0xc00005daf0 sp=0xc00005dac0 pc=0x55b17428d3b6
net/http.HandlerFunc.ServeHTTP(0xc000720700?, {0x55b1750090f0?, 0xc00012b500?}, 0x0?)
net/http/server.go:2220 +0x29 fp=0xc00005db18 sp=0xc00005daf0 pc=0x55b1742152a9
net/http.(*ServeMux).ServeHTTP(0x55b173e5ef25?, {0x55b1750090f0, 0xc00012b500}, 0xc000329cc0)
net/http/server.go:2747 +0x1ca fp=0xc00005db68 sp=0xc00005db18 pc=0x55b1742171aa
net/http.serverHandler.ServeHTTP({0x55b175005cb0?}, {0x55b1750090f0?, 0xc00012b500?}, 0x6?)
net/http/server.go:3210 +0x8e fp=0xc00005db98 sp=0xc00005db68 pc=0x55b17423470e
net/http.(*conn).serve(0xc00051ae10, {0x55b17500b198, 0xc000706ff0})
net/http/server.go:2092 +0x5d0 fp=0xc00005dfb8 sp=0xc00005db98 pc=0x55b174213c50
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3360 +0x28 fp=0xc00005dfe0 sp=0xc00005dfb8 pc=0x55b1742190a8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00005dfe8 sp=0xc00005dfe0 pc=0x55b173ec6ac1
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3360 +0x485
goroutine 71 gp=0xc000230fc0 m=nil [IO wait]:
runtime.gopark(0x55b173e63405?, 0x0?, 0x0?, 0x0?, 0xb?)
runtime/proc.go:424 +0xce fp=0xc000081da8 sp=0xc000081d88 pc=0x55b173ebe6ee
runtime.netpollblock(0x55b173ee1918?, 0x73e55506?, 0xb1?)
runtime/netpoll.go:575 +0xf7 fp=0xc000081de0 sp=0xc000081da8 pc=0x55b173e82357
internal/poll.runtime_pollWait(0x7f0f347c6568, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc000081e00 sp=0xc000081de0 pc=0x55b173ebd9e5
internal/poll.(*pollDesc).wait(0xc000476380?, 0xc0002de041?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000081e28 sp=0xc000081e00 pc=0x55b173f45007
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000476380, {0xc0002de041, 0x1, 0x1})
internal/poll/fd_unix.go:165 +0x27a fp=0xc000081ec0 sp=0xc000081e28 pc=0x55b173f462fa
net.(*netFD).Read(0xc000476380, {0xc0002de041?, 0xc000081f48?, 0x55b173ec0370?})
net/fd_posix.go:55 +0x25 fp=0xc000081f08 sp=0xc000081ec0 pc=0x55b173fb0ae5
net.(*conn).Read(0xc000088568, {0xc0002de041?, 0x0?, 0x55b1757ca6a0?})
net/net.go:189 +0x45 fp=0xc000081f50 sp=0xc000081f08 pc=0x55b173fbf0e5
net.(*TCPConn).Read(0x55b1757070c0?, {0xc0002de041?, 0x0?, 0x0?})
<autogenerated>:1 +0x25 fp=0xc000081f80 sp=0xc000081f50 pc=0x55b173fd22e5
net/http.(*connReader).backgroundRead(0xc0002de030)
net/http/server.go:690 +0x37 fp=0xc000081fc8 sp=0xc000081f80 pc=0x55b17420e5d7
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x55b17420e505
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x55b173ec6ac1
created by net/http.(*connReader).startBackgroundRead in goroutine 91
net/http/server.go:686 +0xb6
rax 0x0
rbx 0x7f0ead9fe640
rcx 0x7f0f340419fc
rdx 0x6
rdi 0x230
rsi 0x23b
rbp 0x23b
rsp 0x7f0ead9fc5e0
r8 0x7f0ead9fc6b0
r9 0xa2e64656c6961
r10 0x8
r11 0x246
r12 0x6
r13 0x16
r14 0x7f0f3200fdf5
r15 0xffffaaaeb8400000
rip 0x7f0f340419fc
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
Intel UHD Graphics 620 is not supported, supported list: https://dgpu-docs.intel.com/devices/hardware-table.html#gpus-with-supported-drivers
OK for my GPU it says "GPUs with unsupported drivers Driver support for the following devices remains available but is not under active development." Maybe someone can have a look anyway... when it's working there is a significant boost...
But what is really rally rally strange is, the crash happens on some programs only, that use the Ollama API. "Ollama run" doesn't crash. Also some conversations with Open-WebUI won't crash.
The devices outside the supported list are not tested, and we don't plan to support it. They are too slow for LLM use cases. You can use cpu to run ollama.
I did a CPU/GPU comparison and yeah, GPU is even worse... so no need to look at this anymore... CPU (llama3.1): 3.74 tokes/s GPU (llama3.1): 2.90 tokes/s
Hmm, i think i prefer GPU as CPU is highly slowing down my laptop. If this could be fixed is really appreciated. It's only crashing on bigger context. Maybe it's only a parameter that has to be changed?!
I'm using an i6700T and it runs very acceptable in my case. The CPU has an HD 530 GPU and it runs with the latest image intelanalytics/ipex-llm-inference-cpp-xpu build localy.
output from sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) HD Graphics 530 9.0.9 [1.5.30872.220000]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) HD Graphics 530 OpenCL 3.0 NEO [24.35.30872.22]
Before I tried to run ollama without ipex-llm, but it was very slow and it used all the cores at 100%. Now it uses only one core and everything else is done with the GPU which I also verified with intel_gpu_top. It's officially maybe not supported, but it runs well in my case.
Strange, my "newer" GPU (HD Graphics 620) is crashing with bigger context. @Intel can you please have a look at it, even though older GPUs are not supported? I really prefer keep CPU away from AI compute, as the laptop doesn't feel overloaded. If you give good support i maybe buy a Lunar Lake system from you.
Seeing this issue aswell with dual intel arc b50 pros, also happens with just one GPU installed. Most models work with fresh chats in open webui but larger character files in something like silly tavern with initial context crash with this error.