intel-extension-for-pytorch icon indicating copy to clipboard operation
intel-extension-for-pytorch copied to clipboard

RuntimeError: UR error on ADL-N with IPEX 2.5/2.6 using .to(torch.float16)

Open blue-notes-robot opened this issue 7 months ago • 5 comments

Describe the bug

Problem statement Executing basic PyTorch operations on an Intel N150 (Alder Lake N "ADL-N") integrated GPU using the xpu device consistently fails with RuntimeError: UR error. This occurs even for fundamental operations like data type conversion (.to(torch.float16)).

Environments

  • Self-compiled IPEX/PyTorch: IPEX v2.5.10+xpu / PyTorch v2.5.1 built from source using the compile_bundle.sh script against the oneAPI 2025.0.2 toolkit (from intel/oneapi-basekit:2025.0.2-0-devel-ubuntu24.04).
  • Official Pre-built Docker Images: intel/intel-extension-for-pytorch:2.5.10-xpu and intel/intel-extension-for-pytorch:2.6.10-xpu.

MWE The following minimal command reliably reproduces the error inside the affected Docker environments (both custom-built and official) when run with access to the host GPU:

SYCL_UR_TRACE=1 python -c "import torch; import intel_extension_for_pytorch; t = torch.tensor([0], device='xpu'); print(t.to(torch.float16))"

Output Running the minimal example produces the following output, including UR loader messages and the final traceback. This output is consistent across the self-built environment and the official 2.5.10-xpu / 2.6.10-xpu Docker images.

[W401 10:47:29.656434432 OperatorEntry.cpp:155] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::_cummax_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
    registered at /ipex/pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
  dispatch key: XPU
  previous kernel: registered at /ipex/pytorch/build/aten/src/ATen/RegisterCPU.cpp:30476
       new kernel: registered at /ipex/intel-extension-for-pytorch/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:2971 (function operator())
<LOADER>[INFO]: loaded adapter 0x0x61763fc88180 (libur_adapter_level_zero.so.0)
<LOADER>[INFO]: loaded adapter 0x0x61763fc89cc0 (libur_adapter_opencl.so.0)
<LOADER>[INFO]: failed to load adapter 'libur_adapter_cuda.so.0' with error: libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_cuda.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter 'libur_adapter_hip.so.0' with error: libur_adapter_hip.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_hip.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_hip.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter 'libur_adapter_native_cpu.so.0' with error: libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_native_cpu.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: UR error

SYCL_UR_TRACE=-1 reveals:

(.hQueue = 0x585e6230dc20, .hKernel = 0x585e6244def0, .workDim = 1, .pGlobalWorkOffset = 0x585e5f064b40 (0), .pGlobalWorkSize = 0x585e5f064b10 (224), .pLocalWorkSize = 0x585e5f064b28 (224), .numEventsInWaitList = 0, .phEventWaitList = {}, .phEvent = 0x7ffca67e6718 (0x585e6244f6a0)) -> UR_RESULT_ERROR_INVALID_ARGUMENT;`

Might be related to this issue in the pytorch repo but posting here since I compiled via IPEX and not 100% sure about the relevance.

Versions

env.txt

blue-notes-robot avatar Apr 01 '25 11:04 blue-notes-robot

Thanks for reporting this, we will take a look and get back to you.

ZailiWang avatar Apr 01 '25 14:04 ZailiWang

Hi, would you run the command with unitrace and provide the output log? unitrace can be installed following the guidance here. Basically the steps would be

  • Ensure the oneAPI basekit env variables are activated (via e.g. source /opt/intel/oneapi/setvars.sh)
  • mkdir build && cd build
  • cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_WITH_MPI=0 ..
  • make install

You may need to logon root account in case some authentication error raised like:

CMake Error at cmake_install.cmake:52 (file):
  file INSTALL cannot copy file
  "/test/unitrace/pti-gpu/tools/unitrace/build/unitrace" to
  "/usr/local/bin/unitrace": Success.

After the installation completed, try with command:

NEOReadDebugKeys=1 PrintDebugSettings=1 PrintDebugMessages=1 LogAlignedAllocations=1 LogAllocationMemoryPool=1 LogAllocationType=1 LogAllocationStdout=1 nohup unitrace python -c "import torch;  import intel_extension_for_pytorch; t = torch.tensor([0], device='xpu'); print(t.to(torch.float16))" > unitrace_out.log &

ZailiWang avatar Apr 03 '25 06:04 ZailiWang

Here: unitrace_out.log

blue-notes-robot avatar Apr 03 '25 10:04 blue-notes-robot

Hi @blue-notes-robot , thanks for the report! I could reproduce it on my local machine. I believe it should be a bug of the driver, and because the device (Gen 12th) iGPU is too old and not in our test matrix, so there might be bugs we didn't track before.

I have submitted the internal track to the driver team. Will contact back when there is update.

Stonepia avatar Apr 07 '25 07:04 Stonepia

Just to supplement, this error is also reproducible on the CPU Intel Core i7-12700H (also Gen 12th). Most affected are still .to(torch.dtype) conversions.

Wetitpig avatar Apr 22 '25 15:04 Wetitpig

Just checking in, did the driver team pick it up?

blue-notes-robot avatar Jul 06 '25 11:07 blue-notes-robot

Hello, just FYI, I have the same issue on a 13th Gen Intel Core i9-13950HX.

smounsav avatar Jul 08 '25 13:07 smounsav

Is there an update on this? I have the same error. 13th Gen Intel(R) Core(TM) i7-1365URE

torch.xpu.get_device_properties() 
_XpuDeviceProperties(name='Intel(R) Iris(R) Xe Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33578+15', total_memory=29486MB, max_compute_units=96, gpu_eu_count=96, gpu_subslice_count=6, max_work_group_size=512, max_num_sub_groups=64, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)

Simple MWE :

import torch
t = torch.tensor([0], device="xpu")
t.to(torch.float16

gives me

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: UR error

Is this because the device (Iris(R) Xe Graphics) simply does not have the capability ?

rajeshgangireddy avatar Aug 07 '25 12:08 rajeshgangireddy

Hi, sorry for the late reply. This issue is the same with https://github.com/pytorch/pytorch/issues/149953 This issue is because the iGPU is old and lacks of the fp64 support, but the PyTorch's to() actually introduced the fp64 dtype, so that the UR error happens. Please try the wrokaround with the env flag:

set OverrideDefaultFP64Settings=1
set IGC_EnableDPEmulation=1

Stonepia avatar Aug 08 '25 00:08 Stonepia

Hi, sorry for the late reply. This issue is the same with pytorch/pytorch#149953 This issue is because the iGPU is old and lacks of the fp64 support, but the PyTorch's to() actually introduced the fp64 dtype, so that the UR error happens. Please try the wrokaround with the env flag:

set OverrideDefaultFP64Settings=1
set IGC_EnableDPEmulation=1

@Stonepia Thanks for the reply. But the workaround didn't work for me. MWE :

import os
os.environ["OverrideDefaultFP64Settings"] = "1"
os.environ["IGC_EnableDPEmulation"] = "1"

import torch

print("OverrideDefaultFP64Settings:", os.environ.get("OverrideDefaultFP64Settings"))
print("IGC_EnableDPEmulation:", os.environ.get("IGC_EnableDPEmulation"))

t = torch.tensor([0], device="xpu")
t.to(torch.float64)

gives me :

OverrideDefaultFP64Settings: 1
IGC_EnableDPEmulation: 1
Traceback (most recent call last):
  File "/home/ubuntu/workspace/code/anomalib/workspace_dir/test_env.py", line 11, in <module>
    t.to(torch.float64)
RuntimeError: UR error

Am i missing something ?

rajeshgangireddy avatar Aug 08 '25 09:08 rajeshgangireddy

I bought an Ultra 5 225H new Mini PC. Tested with PyTorch,LLM,Stable Diffusion. Everything works perfectly. No UR Error anymore. We should save our time.

kevinzhow avatar Aug 08 '25 09:08 kevinzhow

i have the same issue with xpu. nothing helped. only torch-cpu - but it takes 1 hour to generate image with flux.krea )

zoldaten avatar Aug 26 '25 15:08 zoldaten