stablediffusion Stack at Sampling phase.

I've set this SD up on an EC2 instance. I've done all set up but the command will stuck at sampling.

Sampling progress will never change.

What's wrong?

Below is a command that I executed.

A model that I used is here.

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt models/ldm/stable-diffusion-v2.1/model.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v2.1/model.ckpt
Global Step: 110000
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
LatentDiffusion: Running in v-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling:   0%|                                                                                                                        | 0/3 [00:00<?, ?it/s]
data:   0%|                                                                                                                                | 0/1 [00:00<?, ?it/s]

Feb 21 '23 15:02 edom18

+1 I also have the same problem.

Feb 22 '23 17:02 tomanick

Me too.. i have same one too

Feb 23 '23 08:02 BigJoon

same issue here, upgraded from a working ldm (v1) environment.. Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)... Sampling: 0%| | 0/3 [00:00<?, ?it/s] data: 0%| | 0/1 [00:00<?, ?it/s]

Feb 25 '23 18:02 simstodd

+1

Feb 28 '23 00:02 ArashVahabpour

You're computing on the cpu. Try to add "--device cuda", if you have one.

Mar 01 '23 16:03 Dimbl4

--device cuda

when I try that I get RuntimeError: CUDA out of memory. Tried to allocate 9.49 GiB (GPU 0; 24.00 GiB total capacity; 14.72 GiB already allocated; 6.56 GiB free; 14.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

on a 3090, normally this error is wrong pytorch version, but I just did the update/install on the homepage. This is on windows for me by the way

Mar 29 '23 07:03 StephanosPSteer

+1

May 16 '23 08:05 ZXStudio

I guess the code currently does not support v-sampling: https://github.com/Stability-AI/stablediffusion/blob/main/ldm/models/diffusion/ddpm.py#L920

Jun 08 '23 16:06 IceClear

--device cuda

when I try that I get RuntimeError: CUDA out of memory. Tried to allocate 9.49 GiB (GPU 0; 24.00 GiB total capacity; 14.72 GiB already allocated; 6.56 GiB free; 14.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

on a 3090, normally this error is wrong pytorch version, but I just did the update/install on the homepage. This is on windows for me by the way

This issue "RuntimeError: CUDA out of memory" is probably caused by Nvidia Display driver. When you install CUDA, you also install a display driver, that driver has some issues I guess. Use Geforce Experience to update display driver after you install CUDA. In the Geforce Experience app, perform a clean install of the display driver by selecting custom install.

Jun 16 '23 21:06 Shangkorong