ipex-llm
ipex-llm copied to clipboard
win11运行ipex报错:AMX state allocation in the OS failed
win11专业版下安装wsl2,wsl下安装docker desktop,在镜像中运行pytorch代码报错
启动镜像命令
docker run -itd --privileged --device=/dev/dri -v /c//models:/llm/models -v /usr/lib/wsl:/usr/lib/wsl --name=arc_vllm --shm-size="16g" intelanalytics/ipex-llm-serving-vllm-xpu-experiment:2.1.0b2
查看容器内设备
root@d748cc3e41df:/llm/models/resnet# sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) w3-2425 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x56a0] OpenCL 3.0 NEO [23.35.27191.42]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x56a0] OpenCL 3.0 NEO [23.35.27191.42]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.3 [1.3.26241]
执行脚本
root@d748cc3e41df:/llm/models/resnet# python test_torch.py
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
Abort was called at 62 line in file:
./shared/source/os_interface/os_interface.h
LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)LIBXSMM WARNING: AMX state allocation in the OS failed!
LIBXSMM_TARGET: clx [Intel(R) Xeon(R) w3-2425]
Registry and code: 13 MB
Command: python test_torch.py
Uptime: 2.493819 s
Aborted
脚本内容:
root@d748cc3e41df:/llm/models/resnet# cat test_torch.py
import torch
import intel_extension_for_pytorch as ipex
tensor_1 = torch.randn(1, 1, 40, 128).to('xpu')
tensor_2 = torch.randn(1, 1, 128, 40).to('xpu')
print(torch.matmul(tensor_1, tensor_2).size())
# torch.Size([1, 1, 40,40])
Will try to reproduce from our side first.
We've communicated with the user via WeChat, and we couldn't reproduce the issue on our machine, arc17 (System: Windows 11, CPU: i9 13900K, GPU: Arc A770). However, when connecting to the customer's machine (System: Windows 11, CPU: Xeon(R) w3-2425, GPU: Arc A770) via Sunlogin, we were able to reproduce the issue in a container, but not in the conda environment on the host.
Based on our discussions with the customer, we suspect the issue might be related to Windows 11's support for AMX on the desktop or something related to Windows WSL and Docker. The customer is considering switching to a Linux system for testing.