stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: Intel Arc 770 and SDXL checkpoint generates garbage above 832x832 on 1.7 Release
Checklist
- [X] The issue exists after disabling all extensions
- [X] The issue exists on a clean installation of webui
- [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
- [X] The issue exists in the current version of the webui
- [X] The issue has not been reported before recently
- [ ] The issue has been reported before but has not been fixed yet
What happened?
I did a fresh, clean, vanilla install of 1.7 Release and am using --use-ipex as the only flag in startup. The optimization parameters are set to "automatic" as well.
The images produced by the ARC770 at 832x832 using SAI's Base 1.0 SDXL model are "okay" although generating speed is 2.5sec/it (much slower than others are reporting)
Anything above that produces garbage:
When I use the SAI 2.1 768 checkpoint at 896x896, the image is losing coherency but it's not like above
Steps to reproduce the problem
start up auto1111 and use any resolution above 832x832 on the SDXL base model. view the results
What should have happened?
There should be a properly generated image as SDXL is 1024x1024
What browsers do you use to access the UI ?
Mozilla Firefox
Sysinfo
Console logs
venv "C:\Applications\StableDiffusion\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --use-ipex
C:\Applications\StableDiffusion\stable-diffusion-webui\venv\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
no module 'xformers'. Processing without...
No SDP backend available, likely because you are running in pytorch versions < 2.0. In fact, you are using PyTorch 2.0.0a0+gite9ebda2. You might want to consider upgrading.
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3'
Style database not found: C:\Applications\StableDiffusion\stable-diffusion-webui\styles.csv
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
==============================================================================
You are running torch 2.0.0a0+gite9ebda2.
The program is tested to work with torch 2.0.0.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.
Use --skip-version-check commandline argument to disable this check.
==============================================================================
Loading weights [31e35c80fc] from C:\Applications\StableDiffusion\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Creating model from config: C:\Applications\StableDiffusion\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml
Startup time: 16.4s (prepare environment: 0.7s, import torch: 5.0s, import gradio: 1.7s, setup paths: 1.3s, initialize shared: 1.6s, other imports: 0.9s, setup codeformer: 0.2s, load scripts: 2.3s, create ui: 1.0s, gradio launch: 1.6s).
Applying attention optimization: InvokeAI... done.
Model loaded in 76.4s (load weights from disk: 2.2s, create model: 1.0s, apply weights to model: 14.2s, move model to device: 1.0s, load textual inversion embeddings: 45.6s, calculate empty prompt: 12.2s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:18<00:00, 3.91s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:08<00:00, 3.41s/it]
T
Additional information
Intel Driver version is from Nov 7, 2023
I had this too and it got better by using -medvram --medvram-sdxl It's probably just the GPU running out of Vram but instead of giving an error SD forces the generation to finish anyway no matter the result.
I had this too and it got better by using -medvram --medvram-sdxl It's probably just the GPU running out of Vram but instead of giving an error SD forces the generation to finish anyway no matter the result.
interesting, even for a 16 GB VRAM card? That is odd indeed, I'd thought it would have enough VRAM. Seems like it does'nt matter; with the SDXL flag nothing generates and a message appears in the console to press any key and it exits when you do. Using both --med* flags results in an error
It is a known issue of Arc driver that garbage images are generated at 1024x1024 resolution. Try 1080x1080 instead.
BTW, if you generate 512x512 images of batch size 4 (which is effectively 1024x1024?), you may see similar garbage outputs as well:
The images produced by the ARC770 at 832x832 using SAI's Base 1.0 SDXL model are "okay" although generating speed is 2.5sec/it (much slower than others are reporting)
Invoke AI cross-attention optimization is likely CPU-bottlenecked. Try SDP instead. There's also a PR https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14353 for sdp optimization, you may want to cherry-pick it to your local repo.
It is a known issue of Arc driver that garbage images are generated at 1024x1024 resolution. Try 1080x1080 instead.
Thanks, that worked. Is there a list of "forbidden" resolutions for the ARC?
The images produced by the ARC770 at 832x832 using SAI's Base 1.0 SDXL model are "okay" although generating speed is 2.5sec/it (much slower than others are reporting)
Invoke AI cross-attention optimization is likely CPU-bottlenecked. Try SDP instead. There's also a PR #14353 for sdp optimization, you may want to cherry-pick it to your local repo.
I tried SDP and Invoke at 1080x1080:
SDP: ~1.9s/it - ~2.2 s/it (usually on higher end, so over 2.1 or so) Invoke: SDP with no mem: Hard BSOD every single time :) sub-quad: ~3sec/it to start, then dropped to ~1.9
The 3060 does 1080x1080 at 1.16it/sec, so now the ARC is only 2x as slow instead of 4x, so it's a start. Some Nans and errors -997 until restarting when switching from SDP to invoke.
I still haven't popped it into the machine that has reBar, so I'll try that next, and maybe the repo fix too
1.8 release produces garbage at 1000x1000 and up on Arc A770 16GB with updated drivers as of 2024-03-31.