intel-extension-for-pytorch icon indicating copy to clipboard operation
intel-extension-for-pytorch copied to clipboard

`RuntimeError: could not create a primitive` for `torch.matmul` on Arc A730M and Arc A750 for Windows

Open Oscilloscope98 opened this issue 7 months ago • 13 comments

Describe the bug

Machine: Arc A730M (Also met same bug on Arc A750) OS: WIndows 11 Driver: 31.0.101.5081 (Also met same bug with version 31.0.101.5084) oneAPI: 2024.0

Code to reproduce:

call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
python test.py

test.py:

import torch
import intel_extension_for_pytorch as ipex

tensor1 = torch.randn(1, 1, 40, 128).to('xpu')
tensor2 = torch.randn(1, 1, 128, 40).to('xpu')
print(tensor1.dtype)

torch.matmul(tensor1, tensor2).size()

Error message:

Traceback (most recent call last):
  File "D:\yuwen\test.py", line 8, in <module>
    torch.matmul(tensor1, tensor2).size()
RuntimeError: could not create a primitive

The error is still there even if we used set ONEAPI_DEVICE_SELECTOR=level_zero:0 to make only A730M available to the environment.

sycl-ls output on machine with A730M:

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.10.0.17_160000]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i7-12700H OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A730M Graphics OpenCL 3.0 NEO  [31.0.101.5081]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [31.0.101.5081]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A730M Graphics 1.3 [1.3.27616]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.27616]

Versions

PyTorch version: 2.1.0a0+cxx11.abi PyTorch CXX11 ABI: No IPEX version: 2.1.10+xpu IPEX commit: a12f9f650 Build type: Release

OS: Microsoft Windows 11 专业版 GCC version: N/A Clang version: N/A IGC version: 2024.0.0 (2024.0.0.20231017) CMake version: version 3.27.2-msvc1 Libc version: N/A

Python version: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22621-SP0 Is XPU available: True DPCPP runtime version: N/A MKL version: N/A GPU models and configuration: [0] _DeviceProperties(name='Intel(R) Arc(TM) A730M Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=0, total_memory=11934MB, max_compute_units=384, gpu_eu_count=384) [1] _DeviceProperties(name='Intel(R) Iris(R) Xe Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=0, total_memory=14751MB, max_compute_units=96, gpu_eu_count=96) Intel OpenCL ICD version: N/A Level Zero version: N/A

CPU: Architecture=9 CurrentClockSpeed=2300 DeviceID=CPU0 Family=198 L2CacheSize=7680 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2300 Name=12th Gen Intel(R) Core(TM) i7-12700H ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] intel-extension-for-pytorch==2.1.10+xpu [pip3] numpy==1.26.3 [pip3] torch==2.1.0a0+cxx11.abi [pip3] torchaudio==2.1.0a0+cxx11.abi [pip3] torchvision==0.16.0a0+cxx11.abi [conda] intel-extension-for-pytorch 2.1.10+xpu pypi_0 pypi [conda] numpy 1.26.3 pypi_0 pypi [conda] torch 2.1.0a0+cxx11.abi pypi_0 pypi [conda] torchaudio 2.1.0a0+cxx11.abi pypi_0 pypi [conda] torchvision 0.16.0a0+cxx11.abi pypi_0 pypi

Oscilloscope98 avatar Jan 11 '24 09:01 Oscilloscope98

@ashokei @min-jean-cho FYI. issue on Windows

jingxu10 avatar Jan 26 '24 21:01 jingxu10

@Oscilloscope98, is the error reproducible with different input sizes (e.g., smaller input sizes)?

min-jean-cho avatar Jan 26 '24 21:01 min-jean-cho

Hi @min-jean-cho,

The same problem happened for

import torch
import intel_extension_for_pytorch as ipex

tensor1 = torch.randn(1, 1, 1, 2).to('xpu')
tensor2 = torch.randn(1, 1, 2, 1).to('xpu')

torch.matmul(tensor1, tensor2).size()

P.S. test Driver: 31.0.101.5081, test machine Arc A730M

Oscilloscope98 avatar Jan 30 '24 02:01 Oscilloscope98

Working on triage.

jingxu10 avatar Apr 05 '24 02:04 jingxu10

Do you have the graphics card attached to a monitor and disable iGPU in BIOS?

jingxu10 avatar Apr 11 '24 07:04 jingxu10

Do you have the graphics card attached to a monitor and disable iGPU in BIOS?

Hi @jingxu10,

For Arc A730M, it is an NUC machine, and we did not disable iGPU in BIOS.

For Arc A750, I am not sure whether the graphics card was attached to a monitor, but the iGPU was also not disabled in BIOS.

Oscilloscope98 avatar Apr 16 '24 02:04 Oscilloscope98

We found an issue that the card has to be attached to a monitor and disable iGPU to get the dGPU working. We are working on triaging this issue.

jingxu10 avatar Apr 16 '24 23:04 jingxu10