intel-extension-for-pytorch icon indicating copy to clipboard operation
intel-extension-for-pytorch copied to clipboard

Drivers from 4885 and newer break IPEX for native windows.

Open Mindset-Official opened this issue 2 years ago • 14 comments

Describe the issue

I have tried running it in both Sd.Next and ComfyUI and both fail when trying to generate an image. There is no error message it just seems to crash the Webui comletely. 4676 and older worked perfectly fine. Since there is no error message I can't really tell you what is broken. I believe the driver team is notified but I'm not sure what they can do since it's not officially supported, so I figured I would also post in here as well.

Wsl2 seems to still work fine.

a750 Windows 11 AOT compiled IPEX for windows ryzen 5600 32gb of ddr4 at 3200

Mindset-Official avatar Oct 10 '23 17:10 Mindset-Official

@min-jean-cho

jingxu10 avatar Oct 11 '23 05:10 jingxu10

+1.

there is no error message

Under some circumstances, I can see "Abort was called at 198 line in file:" -- I believe this is raised from compute runtime.

I'm trying to isolate the issue.

Nuullll avatar Oct 11 '23 07:10 Nuullll

+1.

there is no error message

Under some circumstances, I can see "Abort was called at 198 line in file:" -- I believe this is raised from compute runtime.

I'm trying to isolate the issue.

Just to confirm, I also got this a few times.

Mindset-Official avatar Oct 12 '23 14:10 Mindset-Official

accelerate with --use_xpu or ipex enabled in config also throws exit status 3221225477 with A750 on Windows 10 and driver 4887

Vipitis avatar Oct 12 '23 21:10 Vipitis

It seems that driver 4885 was breaking backward compatibility against previous drivers.

The officially released IPEX Windows JIT wheels work fine with the following reproducer (the image was generated as expected):

import torch
import intel_extension_for_pytorch as ipex
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16).to("xpu")
prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt).images[0]
image.save(f"astronaut_rides_horse.png")

However, if I use IPEX AOT wheels built from source with driver 4676 (or earlier) (for example, https://github.com/Nuullll/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu-master%2Bdll-bundle), the program crashes.

pip install https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.0.110%2Bxpu-master%2Bdll-bundle/torch-2.0.0a0+gite9ebda2-cp310-cp310-win_amd64.whl
pip install https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.0.110%2Bxpu-master%2Bdll-bundle/intel_extension_for_pytorch-2.0.110+gitc6ea20b-cp310-cp310-win_amd64.whl
pip install diffusers transformers
set SYCL_PI_TRACE=2
python reproducer.py 1> trace.log 2>&1

trace.log

---> piKernelCreate(
	<unknown> : 000001CA0D158570
	<const char *>: _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_
	<unknown> : 0000000DD9BE8C78
PI ---> (*RetKernel)->initialize()
PI ---> piProgramRetain(Program)
) ---> 	pi_result : PI_SUCCESS
	[out]<unknown> ** : 0000000DD9BE8C78[ 000001C9B5B0E8D0 ... ]

...

---> piEnqueueKernelLaunch(
	<unknown> : 000001C8A3717830
	<unknown> : 000001C9B5B0E8D0
	<unknown> : 1
	<unknown> : 0000000DD9BEA658
	<unknown> : 0000000DD9BEA628
	<unknown> : 0000000DD9BEA640
	<unknown> : 0
	pi_event * : 0000000000000000[ nullptr ]
	pi_event * : 000001C8A7A7D1D8[ 0000000000000000 ... ]
PI ---> Queue->insertStartBarrierIfDiscardEventsMode(CommandList)
PI ---> EventCreate(Queue->Context, Queue, ForceHostVisible, Event)
PI ---> piEventRetain(*Event)
PI ---> piKernelRetain(Kernel)

Crashed while executing piEnqueueKernelLaunch for kernel _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_

Probably I should compile IPEX with driver 4885?

Nuullll avatar Oct 15 '23 04:10 Nuullll

It seems that driver 4885 was breaking backward compatibility against previous drivers.

The officially released IPEX Windows JIT wheels work fine with the following reproducer (the image was generated as expected):

import torch
import intel_extension_for_pytorch as ipex
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16).to("xpu")
prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt).images[0]
image.save(f"astronaut_rides_horse.png")

However, if I use IPEX AOT wheels built from source with driver 4676 (or earlier) (for example, https://github.com/Nuullll/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu-master%2Bdll-bundle), the program crashes.

SYCL_PI_TRACE=2 log trace.log

---> piKernelCreate(
	<unknown> : 000001CA0D158570
	<const char *>: _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_
	<unknown> : 0000000DD9BE8C78
PI ---> (*RetKernel)->initialize()
PI ---> piProgramRetain(Program)
) ---> 	pi_result : PI_SUCCESS
	[out]<unknown> ** : 0000000DD9BE8C78[ 000001C9B5B0E8D0 ... ]

...

---> piEnqueueKernelLaunch(
	<unknown> : 000001C8A3717830
	<unknown> : 000001C9B5B0E8D0
	<unknown> : 1
	<unknown> : 0000000DD9BEA658
	<unknown> : 0000000DD9BEA628
	<unknown> : 0000000DD9BEA640
	<unknown> : 0
	pi_event * : 0000000000000000[ nullptr ]
	pi_event * : 000001C8A7A7D1D8[ 0000000000000000 ... ]
PI ---> Queue->insertStartBarrierIfDiscardEventsMode(CommandList)
PI ---> EventCreate(Queue->Context, Queue, ForceHostVisible, Event)
PI ---> piEventRetain(*Event)
PI ---> piKernelRetain(Kernel)

Crashed while executing piEnqueueKernelLaunch for kernel _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_

Probably I should compile IPEX with driver 4885?

You could try and see, but the official wheels haven't been updated (afaik) so I don't think they were compiled on the latest drivers. Maybe the new drivers break something in AOT?

Mindset-Official avatar Oct 15 '23 13:10 Mindset-Official

I tried compiling IPEX AOT for Arc with driver 4887. The reproducer still crashes with the same SYCL PI TRACE log.

Nuullll avatar Oct 16 '23 04:10 Nuullll

Are there any updates on whats going on with the newest drivers? I personally haven't tried the very latest but have heard it is also not working from others.(I may give it a shot if someone says otherwise). Any progress on figuring out what's happening?

Mindset-Official avatar Oct 26 '23 02:10 Mindset-Official

I can confirm that Driver 4885, 4887 and 4900 all cannot work with IPEX AOT, simply because they ship the same Level Zero Compute Runtime "1.3.27193".

Nuullll avatar Oct 26 '23 02:10 Nuullll

I take it this is completely driver level and no way to override and install the older runtime version?

Mindset-Official avatar Oct 26 '23 02:10 Mindset-Official

I take it this is completely driver level and no way to override and install the older runtime version?

I tried to replace the driver storage files ze_intel_gpu64.dll, ze_loader.dll, ze_tracing_layer.dll, ze_validation_layer.dll under C:\Windows\System32 with the older dlls. But apparently I could've missed something -- failed to load compute runtime library.

Nuullll avatar Oct 26 '23 03:10 Nuullll

that's way above my level, however in my folder I do not see a ze_intel_gpu64.dll in the main folder but only in one of the driver state repository folders, this is driver 4676

Mindset-Official avatar Oct 26 '23 03:10 Mindset-Official

that's way above my level, however in my folder I do not see a ze_intel_gpu64.dll in the main folder but only in one of the driver state repository folders, this is driver 4676

Yes, correct. 4 ze_*.dll in driver storage folder and 3 ze_*.dll in system32. I replaced them all but still got no luck :-(

Nuullll avatar Oct 26 '23 04:10 Nuullll

The issue is gone with Driver 4952

Nuullll avatar Nov 02 '23 09:11 Nuullll