terminal doesn't finish the inference
Hi, thanks for your work.
I tried to run the inference by using the steps in the ReadMe, but my terminal gets stuck as if it was in an infinite loop just after the print "making attention of type 'vanilla' with 512 in_channels". There are no errors or meaningful warnings (see the log below)
I am using a GCP notebook to test the inference, which has a GPU with 15G. Any help or suggestion is appreciated.
Here is my command:
(stablesr) jupyter@my-nb:~/src/StableSR$ python scripts/sr_val_ddpm_text_T_vqganfin_old.py --config configs/stableSRNew/v2-finetune_text_T_512.yaml --ckpt stablesr_000117.ckpt --vqgan_ckpt vqgan_cfw_00011.ckpt --init-img inputs/test_example/baby_128.png --outdir outputs --ddpm_steps 10 --dec_w 0.5 --colorfix_type adain
Here is the log:
color correction>>>>>>>>>>> Use adain color correction
Loading model from vqgan_cfw_00011.ckpt
Global Step: 18000
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 64, 64) = 16384 dimensions.
making attention of type 'vanilla' with 512 in_channels
/opt/conda/envs/stablesr/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
/opt/conda/envs/stablesr/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1. You can also use weights=VGG16_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
missing>>>>>>>>>>>>>>>>>>> [] trainable_list>>>>>>>>>>>>>>>>>>> ['decoder.fusion_layer_2.encode_enc_1.norm1.weight', 'decoder.fusion_layer_2.encode_enc_1.norm1.bias', 'decoder.fusion_layer_2.encode_enc_1.conv1.weight', 'decoder.fusion_layer_2.encode_enc_1.conv1.bias', 'decoder.fusion_layer_2.encode_enc_1.norm2.weight', 'decoder.fusion_layer_2.encode_enc_1.norm2.bias', 'decoder.fusion_layer_2.encode_enc_1.conv2.weight', 'decoder.fusion_layer_2.encode_enc_1.conv2.bias', 'decoder.fusion_layer_2.encode_enc_1.conv_out.weight', 'decoder.fusion_layer_2.encode_enc_1.conv_out.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv1.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv1.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv2.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv2.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv3.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv3.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv4.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv4.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv5.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb1.conv5.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv1.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv1.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv2.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv2.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv3.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv3.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv4.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv4.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv5.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb2.conv5.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv1.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv1.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv2.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv2.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv3.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv3.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv4.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv4.bias', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv5.weight', 'decoder.fusion_layer_2.encode_enc_2.0.rdb3.conv5.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv1.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv1.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv2.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv2.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv3.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv3.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv4.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv4.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv5.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb1.conv5.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv1.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv1.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv2.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv2.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv3.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv3.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv4.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv4.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv5.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb2.conv5.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv1.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv1.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv2.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv2.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv3.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv3.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv4.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv4.bias', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv5.weight', 'decoder.fusion_layer_2.encode_enc_2.1.rdb3.conv5.bias', 'decoder.fusion_layer_2.encode_enc_3.norm1.weight', 'decoder.fusion_layer_2.encode_enc_3.norm1.bias', 'decoder.fusion_layer_2.encode_enc_3.conv1.weight', 'decoder.fusion_layer_2.encode_enc_3.conv1.bias', 'decoder.fusion_layer_2.encode_enc_3.norm2.weight', 'decoder.fusion_layer_2.encode_enc_3.norm2.bias', 'decoder.fusion_layer_2.encode_enc_3.conv2.weight', 'decoder.fusion_layer_2.encode_enc_3.conv2.bias', 'decoder.fusion_layer_1.encode_enc_1.norm1.weight', 'decoder.fusion_layer_1.encode_enc_1.norm1.bias', 'decoder.fusion_layer_1.encode_enc_1.conv1.weight', 'decoder.fusion_layer_1.encode_enc_1.conv1.bias', 'decoder.fusion_layer_1.encode_enc_1.norm2.weight', 'decoder.fusion_layer_1.encode_enc_1.norm2.bias', 'decoder.fusion_layer_1.encode_enc_1.conv2.weight', 'decoder.fusion_layer_1.encode_enc_1.conv2.bias', 'decoder.fusion_layer_1.encode_enc_1.conv_out.weight', 'decoder.fusion_layer_1.encode_enc_1.conv_out.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv1.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv1.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv2.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv2.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv3.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv3.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv4.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv4.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv5.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb1.conv5.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv1.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv1.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv2.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv2.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv3.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv3.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv4.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv4.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv5.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb2.conv5.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv1.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv1.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv2.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv2.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv3.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv3.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv4.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv4.bias', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv5.weight', 'decoder.fusion_layer_1.encode_enc_2.0.rdb3.conv5.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv1.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv1.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv2.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv2.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv3.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv3.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv4.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv4.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv5.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb1.conv5.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv1.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv1.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv2.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv2.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv3.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv3.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv4.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv4.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv5.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb2.conv5.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv1.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv1.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv2.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv2.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv3.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv3.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv4.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv4.bias', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv5.weight', 'decoder.fusion_layer_1.encode_enc_2.1.rdb3.conv5.bias', 'decoder.fusion_layer_1.encode_enc_3.norm1.weight', 'decoder.fusion_layer_1.encode_enc_3.norm1.bias', 'decoder.fusion_layer_1.encode_enc_3.conv1.weight', 'decoder.fusion_layer_1.encode_enc_3.conv1.bias', 'decoder.fusion_layer_1.encode_enc_3.norm2.weight', 'decoder.fusion_layer_1.encode_enc_3.norm2.bias', 'decoder.fusion_layer_1.encode_enc_3.conv2.weight', 'decoder.fusion_layer_1.encode_enc_3.conv2.bias', 'loss.discriminator.main.0.weight', 'loss.discriminator.main.0.bias', 'loss.discriminator.main.2.weight', 'loss.discriminator.main.3.weight', 'loss.discriminator.main.3.bias', 'loss.discriminator.main.5.weight', 'loss.discriminator.main.6.weight', 'loss.discriminator.main.6.bias', 'loss.discriminator.main.8.weight', 'loss.discriminator.main.9.weight', 'loss.discriminator.main.9.bias', 'loss.discriminator.main.11.weight', 'loss.discriminator.main.11.bias'] Untrainable_list>>>>>>>>>>>>>>>>>>> ['encoder.conv_in.weight', 'encoder.conv_in.bias', 'encoder.down.0.block.0.norm1.weight', 'encoder.down.0.block.0.norm1.bias', 'encoder.down.0.block.0.conv1.weight', 'encoder.down.0.block.0.conv1.bias', 'encoder.down.0.block.0.norm2.weight', 'encoder.down.0.block.0.norm2.bias', 'encoder.down.0.block.0.conv2.weight', 'encoder.down.0.block.0.conv2.bias', 'encoder.down.0.block.1.norm1.weight', 'encoder.down.0.block.1.norm1.bias', 'encoder.down.0.block.1.conv1.weight', 'encoder.down.0.block.1.conv1.bias', 'encoder.down.0.block.1.norm2.weight', 'encoder.down.0.block.1.norm2.bias', 'encoder.down.0.block.1.conv2.weight', 'encoder.down.0.block.1.conv2.bias', 'encoder.down.0.downsample.conv.weight', 'encoder.down.0.downsample.conv.bias', 'encoder.down.1.block.0.norm1.weight', 'encoder.down.1.block.0.norm1.bias', 'encoder.down.1.block.0.conv1.weight', 'encoder.down.1.block.0.conv1.bias', 'encoder.down.1.block.0.norm2.weight', 'encoder.down.1.block.0.norm2.bias', 'encoder.down.1.block.0.conv2.weight', 'encoder.down.1.block.0.conv2.bias', 'encoder.down.1.block.0.nin_shortcut.weight', 'encoder.down.1.block.0.nin_shortcut.bias', 'encoder.down.1.block.1.norm1.weight', 'encoder.down.1.block.1.norm1.bias', 'encoder.down.1.block.1.conv1.weight', 'encoder.down.1.block.1.conv1.bias', 'encoder.down.1.block.1.norm2.weight', 'encoder.down.1.block.1.norm2.bias', 'encoder.down.1.block.1.conv2.weight', 'encoder.down.1.block.1.conv2.bias', 'encoder.down.1.downsample.conv.weight', 'encoder.down.1.downsample.conv.bias', 'encoder.down.2.block.0.norm1.weight', 'encoder.down.2.block.0.norm1.bias', 'encoder.down.2.block.0.conv1.weight', 'encoder.down.2.block.0.conv1.bias', 'encoder.down.2.block.0.norm2.weight', 'encoder.down.2.block.0.norm2.bias', 'encoder.down.2.block.0.conv2.weight', 'encoder.down.2.block.0.conv2.bias', 'encoder.down.2.block.0.nin_shortcut.weight', 'encoder.down.2.block.0.nin_shortcut.bias', 'encoder.down.2.block.1.norm1.weight', 'encoder.down.2.block.1.norm1.bias', 'encoder.down.2.block.1.conv1.weight', 'encoder.down.2.block.1.conv1.bias', 'encoder.down.2.block.1.norm2.weight', 'encoder.down.2.block.1.norm2.bias', 'encoder.down.2.block.1.conv2.weight', 'encoder.down.2.block.1.conv2.bias', 'encoder.down.2.downsample.conv.weight', 'encoder.down.2.downsample.conv.bias', 'encoder.down.3.block.0.norm1.weight', 'encoder.down.3.block.0.norm1.bias', 'encoder.down.3.block.0.conv1.weight', 'encoder.down.3.block.0.conv1.bias', 'encoder.down.3.block.0.norm2.weight', 'encoder.down.3.block.0.norm2.bias', 'encoder.down.3.block.0.conv2.weight', 'encoder.down.3.block.0.conv2.bias', 'encoder.down.3.block.1.norm1.weight', 'encoder.down.3.block.1.norm1.bias', 'encoder.down.3.block.1.conv1.weight', 'encoder.down.3.block.1.conv1.bias', 'encoder.down.3.block.1.norm2.weight', 'encoder.down.3.block.1.norm2.bias', 'encoder.down.3.block.1.conv2.weight', 'encoder.down.3.block.1.conv2.bias', 'encoder.mid.block_1.norm1.weight', 'encoder.mid.block_1.norm1.bias', 'encoder.mid.block_1.conv1.weight', 'encoder.mid.block_1.conv1.bias', 'encoder.mid.block_1.norm2.weight', 'encoder.mid.block_1.norm2.bias', 'encoder.mid.block_1.conv2.weight', 'encoder.mid.block_1.conv2.bias', 'encoder.mid.attn_1.norm.weight', 'encoder.mid.attn_1.norm.bias', 'encoder.mid.attn_1.q.weight', 'encoder.mid.attn_1.q.bias', 'encoder.mid.attn_1.k.weight', 'encoder.mid.attn_1.k.bias', 'encoder.mid.attn_1.v.weight', 'encoder.mid.attn_1.v.bias', 'encoder.mid.attn_1.proj_out.weight', 'encoder.mid.attn_1.proj_out.bias', 'encoder.mid.block_2.norm1.weight', 'encoder.mid.block_2.norm1.bias', 'encoder.mid.block_2.conv1.weight', 'encoder.mid.block_2.conv1.bias', 'encoder.mid.block_2.norm2.weight', 'encoder.mid.block_2.norm2.bias', 'encoder.mid.block_2.conv2.weight', 'encoder.mid.block_2.conv2.bias', 'encoder.norm_out.weight', 'encoder.norm_out.bias', 'encoder.conv_out.weight', 'encoder.conv_out.bias', 'decoder.conv_in.weight', 'decoder.conv_in.bias', 'decoder.mid.block_1.norm1.weight', 'decoder.mid.block_1.norm1.bias', 'decoder.mid.block_1.conv1.weight', 'decoder.mid.block_1.conv1.bias', 'decoder.mid.block_1.norm2.weight', 'decoder.mid.block_1.norm2.bias', 'decoder.mid.block_1.conv2.weight', 'decoder.mid.block_1.conv2.bias', 'decoder.mid.attn_1.norm.weight', 'decoder.mid.attn_1.norm.bias', 'decoder.mid.attn_1.q.weight', 'decoder.mid.attn_1.q.bias', 'decoder.mid.attn_1.k.weight', 'decoder.mid.attn_1.k.bias', 'decoder.mid.attn_1.v.weight', 'decoder.mid.attn_1.v.bias', 'decoder.mid.attn_1.proj_out.weight', 'decoder.mid.attn_1.proj_out.bias', 'decoder.mid.block_2.norm1.weight', 'decoder.mid.block_2.norm1.bias', 'decoder.mid.block_2.conv1.weight', 'decoder.mid.block_2.conv1.bias', 'decoder.mid.block_2.norm2.weight', 'decoder.mid.block_2.norm2.bias', 'decoder.mid.block_2.conv2.weight', 'decoder.mid.block_2.conv2.bias', 'decoder.up.0.block.0.norm1.weight', 'decoder.up.0.block.0.norm1.bias', 'decoder.up.0.block.0.conv1.weight', 'decoder.up.0.block.0.conv1.bias', 'decoder.up.0.block.0.norm2.weight', 'decoder.up.0.block.0.norm2.bias', 'decoder.up.0.block.0.conv2.weight', 'decoder.up.0.block.0.conv2.bias', 'decoder.up.0.block.0.nin_shortcut.weight', 'decoder.up.0.block.0.nin_shortcut.bias', 'decoder.up.0.block.1.norm1.weight', 'decoder.up.0.block.1.norm1.bias', 'decoder.up.0.block.1.conv1.weight', 'decoder.up.0.block.1.conv1.bias', 'decoder.up.0.block.1.norm2.weight', 'decoder.up.0.block.1.norm2.bias', 'decoder.up.0.block.1.conv2.weight', 'decoder.up.0.block.1.conv2.bias', 'decoder.up.0.block.2.norm1.weight', 'decoder.up.0.block.2.norm1.bias', 'decoder.up.0.block.2.conv1.weight', 'decoder.up.0.block.2.conv1.bias', 'decoder.up.0.block.2.norm2.weight', 'decoder.up.0.block.2.norm2.bias', 'decoder.up.0.block.2.conv2.weight', 'decoder.up.0.block.2.conv2.bias', 'decoder.up.1.block.0.norm1.weight', 'decoder.up.1.block.0.norm1.bias', 'decoder.up.1.block.0.conv1.weight', 'decoder.up.1.block.0.conv1.bias', 'decoder.up.1.block.0.norm2.weight', 'decoder.up.1.block.0.norm2.bias', 'decoder.up.1.block.0.conv2.weight', 'decoder.up.1.block.0.conv2.bias', 'decoder.up.1.block.0.nin_shortcut.weight', 'decoder.up.1.block.0.nin_shortcut.bias', 'decoder.up.1.block.1.norm1.weight', 'decoder.up.1.block.1.norm1.bias', 'decoder.up.1.block.1.conv1.weight', 'decoder.up.1.block.1.conv1.bias', 'decoder.up.1.block.1.norm2.weight', 'decoder.up.1.block.1.norm2.bias', 'decoder.up.1.block.1.conv2.weight', 'decoder.up.1.block.1.conv2.bias', 'decoder.up.1.block.2.norm1.weight', 'decoder.up.1.block.2.norm1.bias', 'decoder.up.1.block.2.conv1.weight', 'decoder.up.1.block.2.conv1.bias', 'decoder.up.1.block.2.norm2.weight', 'decoder.up.1.block.2.norm2.bias', 'decoder.up.1.block.2.conv2.weight', 'decoder.up.1.block.2.conv2.bias', 'decoder.up.1.upsample.conv.weight', 'decoder.up.1.upsample.conv.bias', 'decoder.up.2.block.0.norm1.weight', 'decoder.up.2.block.0.norm1.bias', 'decoder.up.2.block.0.conv1.weight', 'decoder.up.2.block.0.conv1.bias', 'decoder.up.2.block.0.norm2.weight', 'decoder.up.2.block.0.norm2.bias', 'decoder.up.2.block.0.conv2.weight', 'decoder.up.2.block.0.conv2.bias', 'decoder.up.2.block.1.norm1.weight', 'decoder.up.2.block.1.norm1.bias', 'decoder.up.2.block.1.conv1.weight', 'decoder.up.2.block.1.conv1.bias', 'decoder.up.2.block.1.norm2.weight', 'decoder.up.2.block.1.norm2.bias', 'decoder.up.2.block.1.conv2.weight', 'decoder.up.2.block.1.conv2.bias', 'decoder.up.2.block.2.norm1.weight', 'decoder.up.2.block.2.norm1.bias', 'decoder.up.2.block.2.conv1.weight', 'decoder.up.2.block.2.conv1.bias', 'decoder.up.2.block.2.norm2.weight', 'decoder.up.2.block.2.norm2.bias', 'decoder.up.2.block.2.conv2.weight', 'decoder.up.2.block.2.conv2.bias', 'decoder.up.2.upsample.conv.weight', 'decoder.up.2.upsample.conv.bias', 'decoder.up.3.block.0.norm1.weight', 'decoder.up.3.block.0.norm1.bias', 'decoder.up.3.block.0.conv1.weight', 'decoder.up.3.block.0.conv1.bias', 'decoder.up.3.block.0.norm2.weight', 'decoder.up.3.block.0.norm2.bias', 'decoder.up.3.block.0.conv2.weight', 'decoder.up.3.block.0.conv2.bias', 'decoder.up.3.block.1.norm1.weight', 'decoder.up.3.block.1.norm1.bias', 'decoder.up.3.block.1.conv1.weight', 'decoder.up.3.block.1.conv1.bias', 'decoder.up.3.block.1.norm2.weight', 'decoder.up.3.block.1.norm2.bias', 'decoder.up.3.block.1.conv2.weight', 'decoder.up.3.block.1.conv2.bias', 'decoder.up.3.block.2.norm1.weight', 'decoder.up.3.block.2.norm1.bias', 'decoder.up.3.block.2.conv1.weight', 'decoder.up.3.block.2.conv1.bias', 'decoder.up.3.block.2.norm2.weight', 'decoder.up.3.block.2.norm2.bias', 'decoder.up.3.block.2.conv2.weight', 'decoder.up.3.block.2.conv2.bias', 'decoder.up.3.upsample.conv.weight', 'decoder.up.3.upsample.conv.bias', 'decoder.norm_out.weight', 'decoder.norm_out.bias', 'decoder.conv_out.weight', 'decoder.conv_out.bias', 'loss.logvar', 'loss.perceptual_loss.net.slice1.0.weight', 'loss.perceptual_loss.net.slice1.0.bias', 'loss.perceptual_loss.net.slice1.2.weight', 'loss.perceptual_loss.net.slice1.2.bias', 'loss.perceptual_loss.net.slice2.5.weight', 'loss.perceptual_loss.net.slice2.5.bias', 'loss.perceptual_loss.net.slice2.7.weight', 'loss.perceptual_loss.net.slice2.7.bias', 'loss.perceptual_loss.net.slice3.10.weight', 'loss.perceptual_loss.net.slice3.10.bias', 'loss.perceptual_loss.net.slice3.12.weight', 'loss.perceptual_loss.net.slice3.12.bias', 'loss.perceptual_loss.net.slice3.14.weight', 'loss.perceptual_loss.net.slice3.14.bias', 'loss.perceptual_loss.net.slice4.17.weight', 'loss.perceptual_loss.net.slice4.17.bias', 'loss.perceptual_loss.net.slice4.19.weight', 'loss.perceptual_loss.net.slice4.19.bias', 'loss.perceptual_loss.net.slice4.21.weight', 'loss.perceptual_loss.net.slice4.21.bias', 'loss.perceptual_loss.net.slice5.24.weight', 'loss.perceptual_loss.net.slice5.24.bias', 'loss.perceptual_loss.net.slice5.26.weight', 'loss.perceptual_loss.net.slice5.26.bias', 'loss.perceptual_loss.net.slice5.28.weight', 'loss.perceptual_loss.net.slice5.28.bias', 'loss.perceptual_loss.lin0.model.1.weight', 'loss.perceptual_loss.lin1.model.1.weight', 'loss.perceptual_loss.lin2.model.1.weight', 'loss.perceptual_loss.lin3.model.1.weight', 'loss.perceptual_loss.lin4.model.1.weight', 'quant_conv.weight', 'quant_conv.bias', 'post_quant_conv.weight', 'post_quant_conv.bias'] Global seed set to 42 Loading model from stablesr_000117.ckpt Global Step: 16500 LatentDiffusionSRTextWT: Running in eps-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 918.93 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 64, 64) = 16384 dimensions. making attention of type 'vanilla' with 512 in_channels
It seems that your CPU RAM is not enough. You need at least 18G for running.
Thank you for your quick response ! I now tried with another GPU with 24G but seem to be having the same error.
Hi. I mean CPU RAM, not GPU :)