ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Model crashes: IPEX-LLM Ollama v2.3.0-nightly: SIGABRT in sdp_xmx_kernel.cpp with Llama 3.1 on Arc B50 Pro

Open maeffjus opened this issue 1 month ago • 2 comments

The IPEX-LLM packaged Ollama (v2.3.0-nightly build 20250725 for Ubuntu, from ollama-ipex-llm-2.3.0b20250725-ubuntu.tgz) crashes with SIGABRT due to an assertion failure in sdp_xmx_kernel.cpp when attempting to load or run Llama 3.1 models (e.g., llama3.1:8b, llama3.1:8b-instruct-q5_K_M). This occurs on an Intel Arc B50 Pro GPU with current drivers. Other models like gemma2:9b-instruct-q5_K_M work correctly with GPU acceleration on the same setup.

How to reproduce

Assuming a working Ubuntu system with appropriate Intel GPU drivers and the extracted ollama-ipex-llm-2.3.0b20250725-ubuntu package:

Set the required environment variables:

Bash export OLLAMA_LLM_LIBRARY=$(pwd)/llm_c_intel export LD_LIBRARY_PATH=$(pwd)/llm_c_intel/lib:${LD_LIBRARY_PATH} export ZES_ENABLE_SYSMAN=1 Start the Ollama server in the background: ./ollama serve & Attempt to run a Llama 3.1 model: ./ollama run llama3.1:8b "Test"

Observe the server process crashing with the SIGABRT signal and the assertion failure mentioned above in its logs.

Screenshots N/A - Relevant log output below.

Environment information

GPU: Intel Arc B50 Pro OS: Ubuntu 24.04.3 LTS (Noble Numbat) Kernel: 6.14.0-33-generic #33 24.04.1-Ubuntu GPU Drivers (from ppa:kobuk-team/intel-graphics): intel-opencl-icd: 25.35.35096.9-1~24.04~ppa3 libze-intel-gpu1: 25.35.35096.9-1~24.04~ppa3 libze1: 1.24.1-1~24.04~ppa1

IPEX-LLM Ollama Version: v2.3.0-nightly (Build 20250725 from ollama-ipex-llm-2.3.0b20250725-ubuntu.tgz)

Additional context The model gemma2:9b-instruct-q5_K_M works correctly.

Key Log Output during Crash:

[...] ollama-bin: /home/runner/_work/llm.cpp/llm.cpp/llm.cpp/bigdl-core-xe/llama_backend/sdp_xmx_kernel.cpp:439: auto ggml_sycl_op_sdp_xmx_casual(...)::(anonymous class)::operator()() const: Assertion `false' failed. SIGABRT: abort PC=0x742c8f49eb2c m=3 sigcode=18446744073709551610 signal arrived during cgo execution [...] (Goroutine stack trace follows)

maeffjus avatar Oct 28 '25 22:10 maeffjus

I just packed the B50 pro to send it back and replaced it with a A4000 SFF. So - doesen't matter to me anymore. Little side-note: I updated the Kernel to 6.14-OEM and it worked for a short while, but after 3-4 reboots / several model reloads, the issue came again. It also happed with lots of models, when the prompt was longer. (But by far within the VRAM).

maeffjus avatar Oct 30 '25 18:10 maeffjus

According to the intel driver installation document, if you want to use B50, you should use the lastest ubuntu25.04,25.10 instead of ubuntu24.04.

For hardware devices that require at least kernel version 6.14, use Ubuntu 25.04 (Plucky Puffin) or newer. For hardware devices that require at least kernel version 6.8, use Ubuntu 24.04 (Noble) or newer. For hardware devices that require at least kernel version 5.15, use Ubuntu 22.04 (Jammy) or newer.

https://dgpu-docs.intel.com/driver/client/overview.html#selecting-the-right-operating-system-version

ca1ic0 avatar Dec 02 '25 01:12 ca1ic0