ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

win11运行ipex报错:AMX state allocation in the OS failed

Open showyouit opened this issue 1 year ago • 3 comments

win11专业版下安装wsl2,wsl下安装docker desktop,在镜像中运行pytorch代码报错

启动镜像命令

docker run -itd --privileged --device=/dev/dri -v /c//models:/llm/models -v /usr/lib/wsl:/usr/lib/wsl --name=arc_vllm --shm-size="16g" intelanalytics/ipex-llm-serving-vllm-xpu-experiment:2.1.0b2 

查看容器内设备

root@d748cc3e41df:/llm/models/resnet# sycl-ls

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) w3-2425 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x56a0] OpenCL 3.0 NEO  [23.35.27191.42]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x56a0] OpenCL 3.0 NEO  [23.35.27191.42]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.3 [1.3.26241]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x56a0] 1.3 [1.3.26241]

执行脚本

root@d748cc3e41df:/llm/models/resnet# python test_torch.py

/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
Abort was called at 62 line in file:
./shared/source/os_interface/os_interface.h

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)LIBXSMM WARNING: AMX state allocation in the OS failed!

LIBXSMM_TARGET: clx [Intel(R) Xeon(R) w3-2425]
Registry and code: 13 MB
Command: python test_torch.py
Uptime: 2.493819 s
Aborted

脚本内容:

root@d748cc3e41df:/llm/models/resnet# cat test_torch.py

import torch
import intel_extension_for_pytorch as ipex

tensor_1 = torch.randn(1, 1, 40, 128).to('xpu')
tensor_2 = torch.randn(1, 1, 128, 40).to('xpu')
print(torch.matmul(tensor_1, tensor_2).size())

# torch.Size([1, 1, 40,40])

showyouit avatar Aug 08 '24 01:08 showyouit

Will try to reproduce from our side first.

liu-shaojun avatar Aug 08 '24 02:08 liu-shaojun

908376fea6c33c461b83f15181e18b5

glorysdj avatar Aug 08 '24 02:08 glorysdj

We've communicated with the user via WeChat, and we couldn't reproduce the issue on our machine, arc17 (System: Windows 11, CPU: i9 13900K, GPU: Arc A770). However, when connecting to the customer's machine (System: Windows 11, CPU: Xeon(R) w3-2425, GPU: Arc A770) via Sunlogin, we were able to reproduce the issue in a container, but not in the conda environment on the host.

Based on our discussions with the customer, we suspect the issue might be related to Windows 11's support for AMX on the desktop or something related to Windows WSL and Docker. The customer is considering switching to a Linux system for testing.

liu-shaojun avatar Aug 09 '24 01:08 liu-shaojun