stable-diffusion-webui-forge
stable-diffusion-webui-forge copied to clipboard
[Bug]: Stable Video Diffusion seems TOO slow
Checklist
- [ ] The issue exists after disabling all extensions
- [ ] The issue exists on a clean installation of webui
- [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
- [X] The issue exists in the current version of the webui
- [X] The issue has not been reported before recently
- [ ] The issue has been reported before but has not been fixed yet
What happened?
On a 6Gb VRAM CARD, It takes too long to create a video here using FORGE (28 minutes) compared to COMFY UI (12 minutes)
Do I need to activate some optimization in my case? Comfy also uses PyTorch attention and It's very fast, and forge uses PyTorch attention but takes so long. Any hints?
Steps to reproduce the problem
GO TO SDV, generate a video
What should have happened?
make a video a little faster than comfy ui SVD
What browsers do you use to access the UI ?
No response
Sysinfo
30/30 [28:42<00:00, 57.42s/it]
Console logs
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
left over keys: dict_keys(['conditioner.embedders.0.open_clip.model.ln_final.bias', 'conditioner.embedders.0.open_clip.model.ln_final.weight', 'conditioner.embedders.0.open_clip.model.logit_scale', 'conditioner.embedders.0.open_clip.model.positional_embedding', 'conditioner.embedders.0.open_clip.model.text_projection', 'conditioner.embedders.0.open_clip.model.token_embedding.weight', 'conditioner.embedders.3.encoder.decoder.conv_in.bias', 'conditioner.embedders.3.encoder.decoder.conv_in.weight', 'conditioner.embedders.3.encoder.decoder.conv_out.bias', 'conditioner.embedders.3.encoder.decoder.conv_out.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.k.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.k.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.norm.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.norm.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.proj_out.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.proj_out.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.q.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.q.weight', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.v.bias', 'conditioner.embedders.3.encoder.decoder.mid.attn_1.v.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.mid.block_2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.mid.block_2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.norm_out.bias', 'conditioner.embedders.3.encoder.decoder.norm_out.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.nin_shortcut.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.nin_shortcut.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.0.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.0.block.2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.nin_shortcut.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.nin_shortcut.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.0.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.1.block.2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.1.upsample.conv.bias', 'conditioner.embedders.3.encoder.decoder.up.1.upsample.conv.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.0.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.2.block.2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.2.upsample.conv.bias', 'conditioner.embedders.3.encoder.decoder.up.2.upsample.conv.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.0.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.1.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.conv1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.conv1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.conv2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.conv2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.norm1.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.norm1.weight', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.norm2.bias', 'conditioner.embedders.3.encoder.decoder.up.3.block.2.norm2.weight', 'conditioner.embedders.3.encoder.decoder.up.3.upsample.conv.bias', 'conditioner.embedders.3.encoder.decoder.up.3.upsample.conv.weight', 'conditioner.embedders.3.encoder.encoder.conv_in.bias', 'conditioner.embedders.3.encoder.encoder.conv_in.weight', 'conditioner.embedders.3.encoder.encoder.conv_out.bias', 'conditioner.embedders.3.encoder.encoder.conv_out.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.0.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.0.block.1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.0.downsample.conv.bias', 'conditioner.embedders.3.encoder.encoder.down.0.downsample.conv.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.nin_shortcut.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.nin_shortcut.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.0.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.1.block.1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.1.downsample.conv.bias', 'conditioner.embedders.3.encoder.encoder.down.1.downsample.conv.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.nin_shortcut.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.nin_shortcut.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.0.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.2.block.1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.2.downsample.conv.bias', 'conditioner.embedders.3.encoder.encoder.down.2.downsample.conv.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.0.norm2.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.down.3.block.1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.k.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.k.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.norm.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.norm.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.proj_out.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.proj_out.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.q.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.q.weight', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.v.bias', 'conditioner.embedders.3.encoder.encoder.mid.attn_1.v.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_1.conv1.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_1.conv1.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_1.conv2.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_1.conv2.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_1.norm1.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_1.norm1.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_1.norm2.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_1.norm2.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_2.conv1.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_2.conv1.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_2.conv2.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_2.conv2.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_2.norm1.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_2.norm1.weight', 'conditioner.embedders.3.encoder.encoder.mid.block_2.norm2.bias', 'conditioner.embedders.3.encoder.encoder.mid.block_2.norm2.weight', 'conditioner.embedders.3.encoder.encoder.norm_out.bias', 'conditioner.embedders.3.encoder.encoder.norm_out.weight', 'conditioner.embedders.3.encoder.post_quant_conv.bias', 'conditioner.embedders.3.encoder.post_quant_conv.weight', 'conditioner.embedders.3.encoder.quant_conv.bias', 'conditioner.embedders.3.encoder.quant_conv.weight'])
To load target model CLIPVisionModelWithProjection
Begin to load 1 model
Moving model(s) has taken 0.51 seconds
To load target model AutoencodingEngine
Begin to load 1 model
Moving model(s) has taken 0.53 seconds
To load target model SVD_img2vid
Begin to load 1 model
Moving model(s) has taken 0.77 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [28:42<00:00, 57.42s/it]
To load target model AutoencodingEngine
Begin to load 1 model
Moving model(s) has taken 1.79 seconds
Installing imageio[pyav]
Additional information
pinokio SVD takes 45 minutes on a 6GB VRAM Comfy ui takes 12 to 15 minutes on the same VRAM I was hoping forge take the same or faster than comfy.
am I missing something in the arguments?