stable-diffusion icon indicating copy to clipboard operation
stable-diffusion copied to clipboard

semantic-driven image synthesis failed

Open HUuxiaobin opened this issue 2 years ago • 5 comments

Dear author,

Thanks so much for your great contribution in image synthesis community. Inspired by your semantic-to-image demo in figure 8 of paper, I would like to reshow the figure 8 results to demonstrate the effectiveness of semantic-to-image synthesis. But I failed to get the realistic image from the semantic when I use the zoom-in semantic image of figure 8 and the prompt "A fantasy lake under the sunset'' with the checkpoint of stable-diffusion-v1-1. I am wondering if you could help me to fix this problem and then enlarge your great influence in segmentation domain. Have a nice day!

best TU Munich

HUuxiaobin avatar Sep 09 '22 07:09 HUuxiaobin

I'm not sure, but have you tried the models semantic_synthesis512 and semantic_synthesis256 from https://github.com/CompVis/stable-diffusion/blob/main/scripts/download_models.sh ?

KostyaAtarik avatar Sep 09 '22 07:09 KostyaAtarik

Thanks for your quick reply. I run the segmentation-to-image script 'python scripts/img2img.py --prompt "A fantasy lake under the sunset" --init-img ./mask/case2.png --strength 0.8' and the models are from semantic_synthesis256 model and a 512 model. The error occurred as follows: m, u = model.load_state_dict(sd, strict=False) File "/root/anaconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LatentDiffusion: size mismatch for model.diffusion_model.time_embed.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([1280, 320]). size mismatch for model.diffusion_model.time_embed.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.time_embed.2.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.time_embed.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([128, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]). size mismatch for model.diffusion_model.input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([320, 1280]). size mismatch for model.diffusion_model.input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([320, 1280]). size mismatch for model.diffusion_model.input_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.3.0.op.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.3.0.op.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.input_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([512, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 320, 3, 3]). size mismatch for model.diffusion_model.input_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.input_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([512, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.input_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.6.0.op.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.6.0.op.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 640, 3, 3]). size mismatch for model.diffusion_model.input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.input_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.input_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.input_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.input_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.middle_block.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1536, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([1024, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.2.1.conv.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.2.1.conv.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1536, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([512, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for model.diffusion_model.output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for model.diffusion_model.output_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for model.diffusion_model.output_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for model.diffusion_model.output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for model.diffusion_model.output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([512, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 3, 3]). size mismatch for model.diffusion_model.output_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for model.diffusion_model.output_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([512, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 1, 1]). size mismatch for model.diffusion_model.output_blocks.5.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for model.diffusion_model.output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for model.diffusion_model.output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([128, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1920, 3, 3]). size mismatch for model.diffusion_model.output_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.output_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.output_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([128, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1920, 1, 1]). size mismatch for model.diffusion_model.output_blocks.6.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for model.diffusion_model.output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1280, 3, 3]). size mismatch for model.diffusion_model.output_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.output_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.output_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([960]). size mismatch for model.diffusion_model.output_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([960]). size mismatch for model.diffusion_model.output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 960, 3, 3]). size mismatch for model.diffusion_model.output_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for model.diffusion_model.output_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for model.diffusion_model.output_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 960, 1, 1]). size mismatch for model.diffusion_model.output_blocks.8.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for model.diffusion_model.out.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.out.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([320]). size mismatch for model.diffusion_model.out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([4, 320, 3, 3]). size mismatch for model.diffusion_model.out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]). size mismatch for first_stage_model.encoder.conv_out.weight: copying a param with shape torch.Size([3, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 512, 3, 3]). size mismatch for first_stage_model.encoder.conv_out.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for first_stage_model.decoder.conv_in.weight: copying a param with shape torch.Size([512, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 4, 3, 3]). size mismatch for first_stage_model.quant_conv.weight: copying a param with shape torch.Size([3, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 8, 1, 1]). size mismatch for first_stage_model.quant_conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for first_stage_model.post_quant_conv.weight: copying a param with shape torch.Size([3, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([4, 4, 1, 1]). size mismatch for first_stage_model.post_quant_conv.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).

Is this problem of first-stage-model? which model should I use in first stage ?

wget -O models/first_stage_models/kl-f4/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f4.zip wget -O models/first_stage_models/kl-f8/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f8.zip wget -O models/first_stage_models/kl-f16/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f16.zip wget -O models/first_stage_models/kl-f32/model.zip https://ommer-lab.com/files/latent-diffusion/kl-f32.zip wget -O models/first_stage_models/vq-f4/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f4.zip wget -O models/first_stage_models/vq-f4-noattn/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f4-noattn.zip wget -O models/first_stage_models/vq-f8/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f8.zip wget -O models/first_stage_models/vq-f8-n256/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip wget -O models/first_stage_models/vq-f16/model.zip https://ommer-lab.com/files/latent-diffusion/vq-f16.zip

thanks very much for your help. Have a nice day!

best regards

HUuxiaobin avatar Sep 13 '22 06:09 HUuxiaobin

@HUuxiaobin I'm running into the very same error. Did you accomplish to run the model checkpoint for semantic synthesis? I'm currently figuring, that the concatenation and torch shape need to be aligned with the UNet structure. I think therefore the semantic map needs to be encoded to match the latent image representation (probably 320²px in the downsized version according to the error, should be downsized by 4). Is this encoding process available anywhere? I only found it within inpaint.py. Besides that by reading the paper I'd say that the vq-f4 should be the correct first stage for the semantic synthesis.

bohnedan avatar Jan 19 '23 22:01 bohnedan

Hello, I am also working on the task of semantic image synthesis. ⭐May I ask which file should we run? Is it img2img. py? ⭐And how should the variable "config" be set to achieve the task of semantic synthesis? Should we use the yaml file in the configs folder or the yaml file in the models folder? Thank you so much and looking forward to your reply.❤

Hey-Sweety avatar May 27 '24 02:05 Hey-Sweety