Fooocus Upscaling hangs (AMD issue?)

Read Troubleshoot

[x] I admit that I have read the Troubleshoot before making this issue.

Describe the problem When i pick an image to be upscaled and click "Generate" my whole PC starts lagging and the upscale never actually finishes, it just hangs at Upscaling image with shape (1408, 704, 3) ... I am on Arch Linux with 32GB of RAM and 32GB of Swap, my GPU is an AMD Radeon RX 6700 XT.

Full Console Log

$ python entry_with_update.py
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801]
Fooocus version: 2.1.844
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 12272 MB, total RAM 31203 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6700 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /home/vasek/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/vasek/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/vasek/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/vasek/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.38 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 7858938471952840710
[Fooocus] Downloading upscale models ...
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 18 - 9
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] explorer, alien world, bioluminiscence, terrain, cinematic, epic, highly detailed, beautiful composition, intense, glowing, rich deep color, symmetry, stunning, brave, attractive, full background, creative, inspiring, thought, ambient light, intricate, majestic, glorious, illuminated, extremely aesthetic, fine detail, clear, crisp, sharp focus, bright
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] explorer, alien world, bioluminiscence, terrain, cinematic, highly detailed, epic composition, magical atmosphere, very inspirational, full color, intricate, elegant, dynamic, rich bright colors, perfect, sharp focus, beautiful, innocent, mystical, inspired, light, iconic, fine, extremely artistic, stunning, creative, deep background, new, amazing, attractive
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Fooocus] Image processing ...
[Fooocus] Upscaling image from (1408, 704) ...
Upscaling image with shape (1408, 704, 3) ...

Hangs here forever

Dec 16 '23 23:12 Vesek

I have a similar problem with upscaling operations (RX 580 8GB, ROCm 5.7.1, Linux) . They do eventually complete, just extremely slowly (15+ minutes)

Dec 20 '23 00:12 bureaum

Well, at that point i can just do it on my CPU.

Also tried fixing it with --all-in-fp32 because the same problem on 1111 can (supposedly) be fixed by --no-half --precision full but that, kinda predictably, resulted in a full VRAM.

Dec 21 '23 07:12 Vesek

This sadly seems very much like an AMD issue. Does swap offloading work for you @Vesek even when it lags or is the app crashing at some point?

Dec 28 '23 23:12 mashb1t

I just tried to run it for a LONG time, and it actually eventually finished, just like @bureaum pointed out. During that long the RAM, VRAM and swap usage was: vram ram After it eventually finished generating the usage was: after_vram after_ram So it looks swap was barely used. The GPU clearly is doing something, but it just takes a long time.

Dec 29 '23 11:12 Vesek

Are you getting your PyTorch ROCm from the Arch's repository or AMD's repository? From my experience the binaries provided by AMD have a greater chance of working.

Jan 02 '24 01:01 GZGavinZhao

I am using the ones from Arch's repo, in the past I also had issues with them but now they work as they should. I'll try it anyway tho, will need to first find a way to even do it.

Jan 02 '24 02:01 Vesek

Ah okay, then the only thing you need to do is to follow the virtual env install directions and then go to the AMD install section to install AMD's PyTorch. That should be all you need.

Jan 02 '24 02:01 GZGavinZhao

Well, that's what i was doing from the start. I thought you want me to install ROCm from the official repositories which are only available for Ubuntu, RHEL and SLES, installing that manually would have been a real pain.

Edit: Looks like the only way would be to install from source.

Jan 02 '24 02:01 Vesek

Oh sorry for the misunderstanding. If the stable PyTorch doesn't work, would you mind trying the nightly wheels (which build with newer ROCm versions and so hopefully would make things better) and see if that solves the problem? pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7

Jan 02 '24 03:01 GZGavinZhao

I can't believe what I am seeing but turns out that was a great idea, it just works! Thanks! Maybe it should be added to README.md

^{Who would have thought that upgrading to a newer version of a unstable piece of software will make it less unstable...}

Jan 02 '24 03:01 Vesek

Glad I could help! Yeah, I agree that this may be something we want to add to the README, suggesting that users try installing the nightly PyTorch wheels as a last resort. In general, the newer ROCm version you can get, the better, especially if you are on hardware released within the past two years.

Jan 02 '24 03:01 GZGavinZhao

I can't believe what I am seeing but turns out that was a great idea, it just works! Thanks! Maybe it should be added to README.md

Who would have thought that upgrading to a newer version of a unstable piece of software will make it less unstable...

I confirm, install 5.7 solve the problem at least for 2x speed upscaling

Jan 17 '24 16:01 TheNexter