intel-extension-for-pytorch
intel-extension-for-pytorch copied to clipboard
RuntimeError: UR error on ADL-N with IPEX 2.5/2.6 using .to(torch.float16)
Describe the bug
Problem statement Executing basic PyTorch operations on an Intel N150 (Alder Lake N "ADL-N") integrated GPU using the xpu device consistently fails with RuntimeError: UR error. This occurs even for fundamental operations like data type conversion (.to(torch.float16)).
Environments
- Self-compiled IPEX/PyTorch: IPEX v2.5.10+xpu / PyTorch v2.5.1 built from source using the
compile_bundle.shscript against the oneAPI 2025.0.2 toolkit (from intel/oneapi-basekit:2025.0.2-0-devel-ubuntu24.04). - Official Pre-built Docker Images: intel/intel-extension-for-pytorch:2.5.10-xpu and intel/intel-extension-for-pytorch:2.6.10-xpu.
MWE The following minimal command reliably reproduces the error inside the affected Docker environments (both custom-built and official) when run with access to the host GPU:
SYCL_UR_TRACE=1 python -c "import torch; import intel_extension_for_pytorch; t = torch.tensor([0], device='xpu'); print(t.to(torch.float16))"
Output Running the minimal example produces the following output, including UR loader messages and the final traceback. This output is consistent across the self-built environment and the official 2.5.10-xpu / 2.6.10-xpu Docker images.
[W401 10:47:29.656434432 OperatorEntry.cpp:155] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_cummax_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
registered at /ipex/pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /ipex/pytorch/build/aten/src/ATen/RegisterCPU.cpp:30476
new kernel: registered at /ipex/intel-extension-for-pytorch/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:2971 (function operator())
<LOADER>[INFO]: loaded adapter 0x0x61763fc88180 (libur_adapter_level_zero.so.0)
<LOADER>[INFO]: loaded adapter 0x0x61763fc89cc0 (libur_adapter_opencl.so.0)
<LOADER>[INFO]: failed to load adapter 'libur_adapter_cuda.so.0' with error: libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_cuda.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_cuda.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter 'libur_adapter_hip.so.0' with error: libur_adapter_hip.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_hip.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_hip.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter 'libur_adapter_native_cpu.so.0' with error: libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
<LOADER>[INFO]: failed to load adapter '/opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_native_cpu.so.0' with error: /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_native_cpu.so.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "<string>", line 1, in <module>
RuntimeError: UR error
SYCL_UR_TRACE=-1 reveals:
(.hQueue = 0x585e6230dc20, .hKernel = 0x585e6244def0, .workDim = 1, .pGlobalWorkOffset = 0x585e5f064b40 (0), .pGlobalWorkSize = 0x585e5f064b10 (224), .pLocalWorkSize = 0x585e5f064b28 (224), .numEventsInWaitList = 0, .phEventWaitList = {}, .phEvent = 0x7ffca67e6718 (0x585e6244f6a0)) -> UR_RESULT_ERROR_INVALID_ARGUMENT;`
Might be related to this issue in the pytorch repo but posting here since I compiled via IPEX and not 100% sure about the relevance.
Versions
Thanks for reporting this, we will take a look and get back to you.
Hi, would you run the command with unitrace and provide the output log?
unitrace can be installed following the guidance here. Basically the steps would be
- Ensure the oneAPI basekit env variables are activated (via e.g.
source /opt/intel/oneapi/setvars.sh) mkdir build && cd buildcmake -DCMAKE_BUILD_TYPE=Release -DBUILD_WITH_MPI=0 ..make install
You may need to logon root account in case some authentication error raised like:
CMake Error at cmake_install.cmake:52 (file):
file INSTALL cannot copy file
"/test/unitrace/pti-gpu/tools/unitrace/build/unitrace" to
"/usr/local/bin/unitrace": Success.
After the installation completed, try with command:
NEOReadDebugKeys=1 PrintDebugSettings=1 PrintDebugMessages=1 LogAlignedAllocations=1 LogAllocationMemoryPool=1 LogAllocationType=1 LogAllocationStdout=1 nohup unitrace python -c "import torch; import intel_extension_for_pytorch; t = torch.tensor([0], device='xpu'); print(t.to(torch.float16))" > unitrace_out.log &
Here: unitrace_out.log
Hi @blue-notes-robot , thanks for the report! I could reproduce it on my local machine. I believe it should be a bug of the driver, and because the device (Gen 12th) iGPU is too old and not in our test matrix, so there might be bugs we didn't track before.
I have submitted the internal track to the driver team. Will contact back when there is update.
Just to supplement, this error is also reproducible on the CPU Intel Core i7-12700H (also Gen 12th). Most affected are still .to(torch.dtype) conversions.
Just checking in, did the driver team pick it up?
Hello, just FYI, I have the same issue on a 13th Gen Intel Core i9-13950HX.
Is there an update on this? I have the same error. 13th Gen Intel(R) Core(TM) i7-1365URE
torch.xpu.get_device_properties()
_XpuDeviceProperties(name='Intel(R) Iris(R) Xe Graphics', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.33578+15', total_memory=29486MB, max_compute_units=96, gpu_eu_count=96, gpu_subslice_count=6, max_work_group_size=512, max_num_sub_groups=64, sub_group_sizes=[8 16 32], has_fp16=1, has_fp64=0, has_atomic64=1)
Simple MWE :
import torch
t = torch.tensor([0], device="xpu")
t.to(torch.float16
gives me
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: UR error
Is this because the device (Iris(R) Xe Graphics) simply does not have the capability ?
Hi, sorry for the late reply. This issue is the same with https://github.com/pytorch/pytorch/issues/149953
This issue is because the iGPU is old and lacks of the fp64 support, but the PyTorch's to() actually introduced the fp64 dtype, so that the UR error happens. Please try the wrokaround with the env flag:
set OverrideDefaultFP64Settings=1
set IGC_EnableDPEmulation=1
Hi, sorry for the late reply. This issue is the same with pytorch/pytorch#149953 This issue is because the iGPU is old and lacks of the fp64 support, but the PyTorch's
to()actually introduced the fp64 dtype, so that the UR error happens. Please try the wrokaround with the env flag:set OverrideDefaultFP64Settings=1 set IGC_EnableDPEmulation=1
@Stonepia Thanks for the reply. But the workaround didn't work for me. MWE :
import os
os.environ["OverrideDefaultFP64Settings"] = "1"
os.environ["IGC_EnableDPEmulation"] = "1"
import torch
print("OverrideDefaultFP64Settings:", os.environ.get("OverrideDefaultFP64Settings"))
print("IGC_EnableDPEmulation:", os.environ.get("IGC_EnableDPEmulation"))
t = torch.tensor([0], device="xpu")
t.to(torch.float64)
gives me :
OverrideDefaultFP64Settings: 1
IGC_EnableDPEmulation: 1
Traceback (most recent call last):
File "/home/ubuntu/workspace/code/anomalib/workspace_dir/test_env.py", line 11, in <module>
t.to(torch.float64)
RuntimeError: UR error
Am i missing something ?
I bought an Ultra 5 225H new Mini PC. Tested with PyTorch,LLM,Stable Diffusion. Everything works perfectly. No UR Error anymore. We should save our time.
i have the same issue with xpu. nothing helped. only torch-cpu - but it takes 1 hour to generate image with flux.krea )