[Bug Report] Z-Image-Turbo ,HIP error: illegal memory access on AMD ROCm 7.1 (gfx1201) RX9070
Describe the bug
I am encountering a HIP error: an illegal memory access was encountered crash when attempting to run a generation using the [Z-Image-Turbo].
I am running on bleeding-edge AMD hardware (gfx1201 / RX9070) with ROCm 7.1 and PyTorch Nightly.
System Info
- OS: Linux (Ubuntu 24.04 LTS)
- GPU: AMD Radeon RX9070 (gfx1201)
- ROCm Version: 7.1 (native)
- PyTorch Version: 2.10.0.dev20251123+rocm7.1
- Python: 3.12.3
- ComfyUI Version: 0.3.75
Console Logs
Total VRAM 16304 MB, total RAM 31692 MB
pytorch version: 2.10.0.dev20251123+rocm7.1
Set: torch.backends.cudnn.enabled = False for better AMD performance.
AMD arch: gfx1201
ROCm version: (7, 1)
...
Requested to load Lumina2
loaded completely; 13034.80 MB usable, 11739.55 MB loaded, full load: True
0%| | 0/4 [00:00<?, ?it/s]/home/asuna/AI/ComfyUI/comfy/k_diffusion/sampling.py:1391: UserWarning: HIP warning: an illegal memory access was encountered (Triggered internally at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:83.)
if sigma_down == 0 or old_denoised is None:
0%| | 0/4 [00:00<?, ?it/s]
!!! Exception during processing !!! HIP error: an illegal memory access was encountered
Search for `hipErrorIllegalAddress' in https://rocm.docs.amd.com/projects/HIP/en/latest/index.html for more information.
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
Traceback (most recent call last):
...
File "/home/asuna/AI/ComfyUI/comfy/k_diffusion/sampling.py", line 1429, in sample_res_multistep
return res_multistep(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, s_noise=s_noise, noise_sampler=noise_sampler, eta=0., cfg_pp=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asuna/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/asuna/AI/ComfyUI/comfy/k_diffusion/sampling.py", line 1391, in res_multistep
if sigma_down == 0 or old_denoised is None:
^^^^^^^^^^^^^^^
torch.AcceleratorError: HIP error: an illegal memory access was encountered
I get a very similar error trying to run Z-Image-Turbo, both BF16 and FP8 variants, on Fedora with ROCm 7.1 and corresponding PyTorch, on my Strix Halo (gfx1151) GPU.
Sadly, I think in this case the issue is with ROCm, not ComfyUI, so there's not much the team here, or any of us, can do until AMD get off their asses and actually provide proper software support for hardware that's been available to purchase for months.
+1 here, also running into this problem. I agree with DisturbedNeo though.. I think this is an issue with ROCm :/
+1 here, also running into this problem. I agree with DisturbedNeo though.. I think this is an issue with ROCm :/
I switched to Arch Linux and was able to generate images, but after generating two images, I still got the same error and couldn't generate any more. Perhaps we can only wait.
python -m pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre torch torchaudio torchvision
I'm actually running fine on windows 11 (portable amd build) with 9070XT. Generation is ~2x faster than my RTX 3060.
Total VRAM 16304 MB, total RAM 32694 MB
pytorch version: 2.10.0a0+rocm7.11.0a20251203
Set: torch.backends.cudnn.enabled = False for better AMD performance.
AMD arch: gfx1201
ROCm version: (7, 2)
Enabled fp16 accumulation.
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 9070 XT : native
Enabled pinned memory 14712.0
Using pytorch attention
Python version: 3.12.10 (tags/v3.12.10:0cc8128, Apr 8 2025, 12:21:36) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.76
ComfyUI frontend version: 1.32.10
Although no FP8 ever working.
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
Yes, this is an issue from ROCm 7.1.1 .
quick fix , add to kernel command line:
amdgpu.cwsr_enable=0
got the fix from rocm github
https://github.com/ROCm/TheRock/issues/1795