stable-diffusion-webui-forge icon indicating copy to clipboard operation
stable-diffusion-webui-forge copied to clipboard

[Bug]: Stable Video Diffusion seems TOO slow

Open LIQUIDMIND111 opened this issue 1 year ago • 0 comments

Checklist

  • [ ] The issue exists after disabling all extensions
  • [ ] The issue exists on a clean installation of webui
  • [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • [X] The issue exists in the current version of the webui
  • [X] The issue has not been reported before recently
  • [ ] The issue has been reported before but has not been fixed yet

What happened?

On a 6Gb VRAM CARD, It takes too long to create a video here using FORGE (28 minutes) compared to COMFY UI (12 minutes)

Do I need to activate some optimization in my case? Comfy also uses PyTorch attention and It's very fast, and forge uses PyTorch attention but takes so long. Any hints?

Steps to reproduce the problem

GO TO SDV, generate a video

What should have happened?

make a video a little faster than comfy ui SVD

What browsers do you use to access the UI ?

No response

Sysinfo

30/30 [28:42<00:00, 57.42s/it]

Console logs

Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
left over keys: dict_keys(['conditioner.embedders.0.open_clip.model.ln_final.bias', 'conditioner.embedders.0.open_clip.model.ln_final.weight', 'conditioner.embedders.0.open_clip.model.logit_scale', 'conditioner.embedders.0.open_clip.model.positional_embedding', 'conditioner.embedders.0.open_clip.model.text_projection', 'conditioner.embedders.0.open_clip.model.token_embedding.weight', 'conditioner.embedders.3.encoder.decoder.conv_in.bias', 'conditioner.embedders.3.encoder.decoder.conv_in.weight', 'conditioner.embedders.3.encoder.decoder.conv_out.bias', 'conditioner.embedders.3.encoder.decoder.conv_out.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.k.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.k.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.norm.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.norm.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.proj_out.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.proj_out.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.q.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.q.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.v.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.v.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.norm_out.bias', 'conditioner.embedders.3.encoder.decoder.norm_out.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.nin_shortcut.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.nin_shortcut.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.nin_shortcut.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.nin_shortcut.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.upsample.conv.bias', 'conditioner.embedders.3.encoder.decoder.up.1.upsample.conv.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.upsample.conv.bias', 'conditioner.embedders.3.encoder.decoder.up.2.upsample.conv.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.upsample.conv.bias', 'conditioner.embedders.3.encoder.decoder.up.3.upsample.conv.weight', 'conditioner.embedders.3.encoder.encoder.conv_in.bias', 'conditioner.embedders.3.encoder.encoder.conv_in.weight', 'conditioner.embedders.3.encoder.encoder.conv_out.bias', 'conditioner.embedders.3.encoder.encoder.conv_out.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.0.downsample.conv.bias', 'conditioner.embedders.3.encoder.encoder.down.0.downsample.conv.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.nin_shortcut.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.nin_shortcut.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.1.downsample.conv.bias', 'conditioner.embedders.3.encoder.encoder.down.1.downsample.conv.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.nin_shortcut.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.nin_shortcut.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.2.downsample.conv.bias', 'conditioner.embedders.3.encoder.encoder.down.2.downsample.conv.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.k.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.k.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.norm.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.norm.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.proj_out.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.proj_out.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.q.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.q.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.v.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.v.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_2.conv1.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_2.conv1.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_2.conv2.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_2.conv2.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_2.norm1.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_2.norm1.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_2.norm2.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_2.norm2.weight', 'conditioner.embedders.3.encoder.encoder.norm_out.bias', 'conditioner.embedders.3.encoder.encoder.norm_out.weight', 'conditioner.embedders.3.encoder.post_quant_conv.bias', 'conditioner.embedders.3.encoder.post_quant_conv.weight', 'conditioner.embedders.3.encoder.quant_conv.bias', 'conditioner.embedders.3.encoder.quant_conv.weight'])
To load target model CLIPVisionModelWithProjection
Begin to load 1 model
Moving model(s) has taken 0.51 seconds
To load target model AutoencodingEngine
Begin to load 1 model
Moving model(s) has taken 0.53 seconds
To load target model SVD_img2vid
Begin to load 1 model
Moving model(s) has taken 0.77 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [28:42<00:00, 57.42s/it]
To load target model AutoencodingEngine
Begin to load 1 model
Moving model(s) has taken 1.79 seconds
Installing imageio[pyav]

Additional information

pinokio SVD takes 45 minutes on a 6GB VRAM Comfy ui takes 12 to 15 minutes on the same VRAM I was hoping forge take the same or faster than comfy.

am I missing something in the arguments?

LIQUIDMIND111 avatar Feb 19 '24 03:02 LIQUIDMIND111