guided-diffusion
guided-diffusion copied to clipboard
Apparent Architecture Mismatch when training on custom data
I am attempting to train a guided diffusion model on my own custom training data. I have successfully trained a model using the improved-diffusion github on my custom data and was able to sample images that fit the training data. I am now attempting to train a classifer on the same data and use it for the guided diffusion process. When I run my shell script, however, I get the following error:
Error Message
creating model and diffusion...
Traceback (most recent call last):
File "scripts/classifier_sample.py", line 134, in <module>
main()
File "scripts/classifier_sample.py", line 38, in main
model.load_state_dict(
File "/home/allenm/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNetModel:
Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias", "input_blocks.4.0.emb_layers.1.weight", "input_blocks.4.0.emb_layers.1.bias", "input_blocks.4.0.out_layers.0.weight", "input_blocks.4.0.out_layers.0.bias", "input_blocks.4.0.out_layers.3.weight", "input_blocks.4.0.out_layers.3.bias", "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.8.0.in_layers.0.weight", "input_blocks.8.0.in_layers.0.bias", "input_blocks.8.0.in_layers.2.weight", "input_blocks.8.0.in_layers.2.bias", "input_blocks.8.0.emb_layers.1.weight", "input_blocks.8.0.emb_layers.1.bias", "input_blocks.8.0.out_layers.0.weight", "input_blocks.8.0.out_layers.0.bias", "input_blocks.8.0.out_layers.3.weight", "input_blocks.8.0.out_layers.3.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.16.0.in_layers.0.weight", "input_blocks.16.0.in_layers.0.bias", "input_blocks.16.0.in_layers.2.weight", "input_blocks.16.0.in_layers.2.bias", "input_blocks.16.0.emb_layers.1.weight", "input_blocks.16.0.emb_layers.1.bias", "input_blocks.16.0.out_layers.0.weight", "input_blocks.16.0.out_layers.0.bias", "input_blocks.16.0.out_layers.3.weight", "input_blocks.16.0.out_layers.3.bias", "input_blocks.16.1.norm.weight", "input_blocks.16.1.norm.bias", "input_blocks.16.1.qkv.weight", "input_blocks.16.1.qkv.bias", "input_blocks.16.1.proj_out.weight", "input_blocks.16.1.proj_out.bias",
...
"output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias",
...
"output_blocks.23.0.out_layers.0.bias", "output_blocks.23.0.out_layers.3.weight", "output_blocks.23.0.out_layers.3.bias", "output_blocks.23.0.skip_connection.weight", "output_blocks.23.0.skip_connection.bias", "output_blocks.3.2.conv.weight", "output_blocks.3.2.conv.bias", "output_blocks.7.2.conv.weight", "output_blocks.7.2.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.15.1.conv.weight", "output_blocks.15.1.conv.bias".
size mismatch for time_embed.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
size mismatch for time_embed.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for time_embed.2.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for time_embed.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for label_emb.weight: copying a param with shape torch.Size([1000, 512]) from checkpoint, the shape in current model is torch.Size([1000, 1024]).
size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 3, 3, 3]).
size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
...
size mismatch for input_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for input_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for input_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for input_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for input_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.13.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.13.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.13.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3, 3]).
size mismatch for input_blocks.13.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for input_blocks.13.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for input_blocks.13.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.13.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.13.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.13.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.14.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for input_blocks.14.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for input_blocks.14.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.14.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.15.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for input_blocks.15.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for input_blocks.15.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.15.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for input_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for input_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for input_blocks.17.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for input_blocks.17.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for input_blocks.17.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for middle_block.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for middle_block.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for middle_block.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
size mismatch for output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.0.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.0.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.0.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
size mismatch for output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.1.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.1.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
size mismatch for output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.2.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.2.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in
...
checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for output_blocks.17.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for out.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for out.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([6, 256, 3, 3]).
size mismatch for out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([6]).
The scripts used for sampling, training the diffusion model, and training the classifier
This is the script that fails and gets the message above.
Image_sample.sh
#!/bin/bash
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 256 --learn_sigma True --num_channels 256 --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 4 --classifier_width 32 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True"
# CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True"
SAMPLE_FLAGS="--batch_size 4 --num_samples 50000 --timestep_respacing ddim25 --use_ddim True"
export OPENAI_LOGDIR="~/diffusion/guided-diffusion/outputs/first_output"
# fix the model path and classifier path
python scripts/classifier_sample.py \
--model_path ~/path/to/model010000.pt \
--classifier_path ~/path/to/model020000.pt \
$MODEL_FLAGS $CLASSIFIER_FLAGS $SAMPLE_FLAGS
</details>
This is the script used to generate the diffusion model:
<details>
<summary>train.sh</summary>
#!/bin/bash
MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 3 --class_cond True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
TRAIN_FLAGS="--lr 1e-4 --batch_size 4"
export OPENAI_LOGDIR="~/diffusion/improved-diffusion/training_logs/256_classcond/"
python scripts/image_train.py \
--data_dir /data/path/to/improved_diffusion_data \
$MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS
This is the script used to generate the classifier:
classifier_train.sh
#!/bin/bash
TRAIN_FLAGS="--iterations 300000 --anneal_lr True --batch_size 4 --lr 3e-4 --save_interval 10000 --weight_decay 0.05"
CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 32 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"
export OPENAI_LOGDIR="~/diffusion/guided-diffusion/trained_classifiers/256_class_test"
python scripts/classifier_train.py \
--data_dir /data/ur/berisha/mitch/improved_diffusion_data \
$TRAIN_FLAGS $CLASSIFIER_FLAGS
Based on the error message I believe it has to do with mismatched architecture. I am attempting to retrain with different hyperparemetrs but am uncertain which ones could be causing this problem. Admittedly I am not sure that this is even the cause, so it very well may be something else causing this problem.
Any pointers in the right direction would be greatly appreciated. I am reading over the paper again and watching some videos to see if that sheds some light as to how this problem is occuring. IF any more information is needed to help let me know!
Due to the character limit I had to remove a lot of the error message. It is posted in its entirety here in case it is needed:
Full Error Message
creating model and diffusion...
Traceback (most recent call last):
File "scripts/classifier_sample.py", line 134, in <module>
main()
File "scripts/classifier_sample.py", line 38, in main
model.load_state_dict(
File "/home/allenm/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNetModel:
Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias", "input_blocks.4.0.emb_layers.1.weight", "input_blocks.4.0.emb_layers.1.bias", "input_blocks.4.0.out_layers.0.weight", "input_blocks.4.0.out_layers.0.bias", "input_blocks.4.0.out_layers.3.weight", "input_blocks.4.0.out_layers.3.bias", "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.8.0.in_layers.0.weight", "input_blocks.8.0.in_layers.0.bias", "input_blocks.8.0.in_layers.2.weight", "input_blocks.8.0.in_layers.2.bias", "input_blocks.8.0.emb_layers.1.weight", "input_blocks.8.0.emb_layers.1.bias", "input_blocks.8.0.out_layers.0.weight", "input_blocks.8.0.out_layers.0.bias", "input_blocks.8.0.out_layers.3.weight", "input_blocks.8.0.out_layers.3.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.16.0.in_layers.0.weight", "input_blocks.16.0.in_layers.0.bias", "input_blocks.16.0.in_layers.2.weight", "input_blocks.16.0.in_layers.2.bias", "input_blocks.16.0.emb_layers.1.weight", "input_blocks.16.0.emb_layers.1.bias", "input_blocks.16.0.out_layers.0.weight", "input_blocks.16.0.out_layers.0.bias", "input_blocks.16.0.out_layers.3.weight", "input_blocks.16.0.out_layers.3.bias", "input_blocks.16.1.norm.weight", "input_blocks.16.1.norm.bias", "input_blocks.16.1.qkv.weight", "input_blocks.16.1.qkv.bias", "input_blocks.16.1.proj_out.weight", "input_blocks.16.1.proj_out.bias", "output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias", "output_blocks.8.2.out_layers.3.weight", "output_blocks.8.2.out_layers.3.bias", "output_blocks.11.1.in_layers.0.weight", "output_blocks.11.1.in_layers.0.bias", "output_blocks.11.1.in_layers.2.weight", "output_blocks.11.1.in_layers.2.bias", "output_blocks.11.1.emb_layers.1.weight", "output_blocks.11.1.emb_layers.1.bias", "output_blocks.11.1.out_layers.0.weight", "output_blocks.11.1.out_layers.0.bias", "output_blocks.11.1.out_layers.3.weight", "output_blocks.11.1.out_layers.3.bias", "output_blocks.14.1.in_layers.0.weight", "output_blocks.14.1.in_layers.0.bias", "output_blocks.14.1.in_layers.2.weight", "output_blocks.14.1.in_layers.2.bias", "output_blocks.14.1.emb_layers.1.weight", "output_blocks.14.1.emb_layers.1.bias", "output_blocks.14.1.out_layers.0.weight", "output_blocks.14.1.out_layers.0.bias", "output_blocks.14.1.out_layers.3.weight", "output_blocks.14.1.out_layers.3.bias".
Unexpected key(s) in state_dict: "input_blocks.18.0.in_layers.0.weight", "input_blocks.18.0.in_layers.0.bias", "input_blocks.18.0.in_layers.2.weight", "input_blocks.18.0.in_layers.2.bias", "input_blocks.18.0.emb_layers.1.weight", "input_blocks.18.0.emb_layers.1.bias", "input_blocks.18.0.out_layers.0.weight", "input_blocks.18.0.out_layers.0.bias", "input_blocks.18.0.out_layers.3.weight", "input_blocks.18.0.out_layers.3.bias", "input_blocks.18.1.norm.weight", "input_blocks.18.1.norm.bias", "input_blocks.18.1.qkv.weight", "input_blocks.18.1.qkv.bias", "input_blocks.18.1.proj_out.weight", "input_blocks.18.1.proj_out.bias", "input_blocks.19.0.in_layers.0.weight", "input_blocks.19.0.in_layers.0.bias", "input_blocks.19.0.in_layers.2.weight", "input_blocks.19.0.in_layers.2.bias", "input_blocks.19.0.emb_layers.1.weight", "input_blocks.19.0.emb_layers.1.bias", "input_blocks.19.0.out_layers.0.weight", "input_blocks.19.0.out_layers.0.bias", "input_blocks.19.0.out_layers.3.weight", "input_blocks.19.0.out_layers.3.bias", "input_blocks.19.1.norm.weight", "input_blocks.19.1.norm.bias", "input_blocks.19.1.qkv.weight", "input_blocks.19.1.qkv.bias", "input_blocks.19.1.proj_out.weight", "input_blocks.19.1.proj_out.bias", "input_blocks.20.0.op.weight", "input_blocks.20.0.op.bias", "input_blocks.21.0.in_layers.0.weight", "input_blocks.21.0.in_layers.0.bias", "input_blocks.21.0.in_layers.2.weight", "input_blocks.21.0.in_layers.2.bias", "input_blocks.21.0.emb_layers.1.weight", "input_blocks.21.0.emb_layers.1.bias", "input_blocks.21.0.out_layers.0.weight", "input_blocks.21.0.out_layers.0.bias", "input_blocks.21.0.out_layers.3.weight", "input_blocks.21.0.out_layers.3.bias", "input_blocks.21.1.norm.weight", "input_blocks.21.1.norm.bias", "input_blocks.21.1.qkv.weight", "input_blocks.21.1.qkv.bias", "input_blocks.21.1.proj_out.weight", "input_blocks.21.1.proj_out.bias", "input_blocks.22.0.in_layers.0.weight", "input_blocks.22.0.in_layers.0.bias", "input_blocks.22.0.in_layers.2.weight", "input_blocks.22.0.in_layers.2.bias", "input_blocks.22.0.emb_layers.1.weight", "input_blocks.22.0.emb_layers.1.bias", "input_blocks.22.0.out_layers.0.weight", "input_blocks.22.0.out_layers.0.bias", "input_blocks.22.0.out_layers.3.weight", "input_blocks.22.0.out_layers.3.bias", "input_blocks.22.1.norm.weight", "input_blocks.22.1.norm.bias", "input_blocks.22.1.qkv.weight", "input_blocks.22.1.qkv.bias", "input_blocks.22.1.proj_out.weight", "input_blocks.22.1.proj_out.bias", "input_blocks.23.0.in_layers.0.weight", "input_blocks.23.0.in_layers.0.bias", "input_blocks.23.0.in_layers.2.weight", "input_blocks.23.0.in_layers.2.bias", "input_blocks.23.0.emb_layers.1.weight", "input_blocks.23.0.emb_layers.1.bias", "input_blocks.23.0.out_layers.0.weight", "input_blocks.23.0.out_layers.0.bias", "input_blocks.23.0.out_layers.3.weight", "input_blocks.23.0.out_layers.3.bias", "input_blocks.23.1.norm.weight", "input_blocks.23.1.norm.bias", "input_blocks.23.1.qkv.weight", "input_blocks.23.1.qkv.bias", "input_blocks.23.1.proj_out.weight", "input_blocks.23.1.proj_out.bias", "input_blocks.4.0.op.weight", "input_blocks.4.0.op.bias", "input_blocks.8.0.op.weight", "input_blocks.8.0.op.bias", "input_blocks.9.0.skip_connection.weight", "input_blocks.9.0.skip_connection.bias", "input_blocks.12.0.op.weight", "input_blocks.12.0.op.bias", "input_blocks.16.0.op.weight", "input_blocks.16.0.op.bias", "input_blocks.17.0.skip_connection.weight", "input_blocks.17.0.skip_connection.bias", "output_blocks.18.0.in_layers.0.weight", "output_blocks.18.0.in_layers.0.bias", "output_blocks.18.0.in_layers.2.weight", "output_blocks.18.0.in_layers.2.bias", "output_blocks.18.0.emb_layers.1.weight", "output_blocks.18.0.emb_layers.1.bias", "output_blocks.18.0.out_layers.0.weight", "output_blocks.18.0.out_layers.0.bias", "output_blocks.18.0.out_layers.3.weight", "output_blocks.18.0.out_layers.3.bias", "output_blocks.18.0.skip_connection.weight", "output_blocks.18.0.skip_connection.bias", "output_blocks.19.0.in_layers.0.weight", "output_blocks.19.0.in_layers.0.bias", "output_blocks.19.0.in_layers.2.weight", "output_blocks.19.0.in_layers.2.bias", "output_blocks.19.0.emb_layers.1.weight", "output_blocks.19.0.emb_layers.1.bias", "output_blocks.19.0.out_layers.0.weight", "output_blocks.19.0.out_layers.0.bias", "output_blocks.19.0.out_layers.3.weight", "output_blocks.19.0.out_layers.3.bias", "output_blocks.19.0.skip_connection.weight", "output_blocks.19.0.skip_connection.bias", "output_blocks.19.1.conv.weight", "output_blocks.19.1.conv.bias", "output_blocks.20.0.in_layers.0.weight", "output_blocks.20.0.in_layers.0.bias", "output_blocks.20.0.in_layers.2.weight", "output_blocks.20.0.in_layers.2.bias", "output_blocks.20.0.emb_layers.1.weight", "output_blocks.20.0.emb_layers.1.bias", "output_blocks.20.0.out_layers.0.weight", "output_blocks.20.0.out_layers.0.bias", "output_blocks.20.0.out_layers.3.weight", "output_blocks.20.0.out_layers.3.bias", "output_blocks.20.0.skip_connection.weight", "output_blocks.20.0.skip_connection.bias", "output_blocks.21.0.in_layers.0.weight", "output_blocks.21.0.in_layers.0.bias", "output_blocks.21.0.in_layers.2.weight", "output_blocks.21.0.in_layers.2.bias", "output_blocks.21.0.emb_layers.1.weight", "output_blocks.21.0.emb_layers.1.bias", "output_blocks.21.0.out_layers.0.weight", "output_blocks.21.0.out_layers.0.bias", "output_blocks.21.0.out_layers.3.weight", "output_blocks.21.0.out_layers.3.bias", "output_blocks.21.0.skip_connection.weight", "output_blocks.21.0.skip_connection.bias", "output_blocks.22.0.in_layers.0.weight", "output_blocks.22.0.in_layers.0.bias", "output_blocks.22.0.in_layers.2.weight", "output_blocks.22.0.in_layers.2.bias", "output_blocks.22.0.emb_layers.1.weight", "output_blocks.22.0.emb_layers.1.bias", "output_blocks.22.0.out_layers.0.weight", "output_blocks.22.0.out_layers.0.bias", "output_blocks.22.0.out_layers.3.weight", "output_blocks.22.0.out_layers.3.bias", "output_blocks.22.0.skip_connection.weight", "output_blocks.22.0.skip_connection.bias", "output_blocks.23.0.in_layers.0.weight", "output_blocks.23.0.in_layers.0.bias", "output_blocks.23.0.in_layers.2.weight", "output_blocks.23.0.in_layers.2.bias", "output_blocks.23.0.emb_layers.1.weight", "output_blocks.23.0.emb_layers.1.bias", "output_blocks.23.0.out_layers.0.weight", "output_blocks.23.0.out_layers.0.bias", "output_blocks.23.0.out_layers.3.weight", "output_blocks.23.0.out_layers.3.bias", "output_blocks.23.0.skip_connection.weight", "output_blocks.23.0.skip_connection.bias", "output_blocks.3.2.conv.weight", "output_blocks.3.2.conv.bias", "output_blocks.7.2.conv.weight", "output_blocks.7.2.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.15.1.conv.weight", "output_blocks.15.1.conv.bias".
size mismatch for time_embed.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
size mismatch for time_embed.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for time_embed.2.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for time_embed.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for label_emb.weight: copying a param with shape torch.Size([1000, 512]) from checkpoint, the shape in current model is torch.Size([1000, 1024]).
size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 3, 3, 3]).
size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for input_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for input_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for input_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for input_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for input_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for input_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for input_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for input_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.13.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.13.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.13.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3, 3]).
size mismatch for input_blocks.13.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for input_blocks.13.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for input_blocks.13.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.13.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.13.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.13.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.14.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for input_blocks.14.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for input_blocks.14.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.14.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.14.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.15.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for input_blocks.15.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for input_blocks.15.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.15.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.15.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for input_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for input_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for input_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for input_blocks.17.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for input_blocks.17.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for input_blocks.17.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for input_blocks.17.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for middle_block.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for middle_block.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for middle_block.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
size mismatch for output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.0.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.0.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.0.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.0.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
size mismatch for output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.1.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.1.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.1.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
size mismatch for output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.2.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.2.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.2.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
size mismatch for output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.3.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.3.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.3.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
size mismatch for output_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
size mismatch for output_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.4.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.4.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.4.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.4.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.4.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1536, 3, 3]).
size mismatch for output_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for output_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for output_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
size mismatch for output_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1536, 1, 1]).
size mismatch for output_blocks.5.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.5.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.5.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.5.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
size mismatch for output_blocks.5.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
size mismatch for output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
size mismatch for output_blocks.5.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1536, 3, 3]).
size mismatch for output_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1536, 1, 1]).
size mismatch for output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([512, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).
size mismatch for output_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([512, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
size mismatch for output_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([256, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).
size mismatch for output_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for output_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for output_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([256, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
size mismatch for output_blocks.8.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).
size mismatch for output_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for output_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for output_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.9.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
size mismatch for output_blocks.9.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).
size mismatch for output_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for output_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for output_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.10.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
size mismatch for output_blocks.10.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for output_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for output_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 768, 3, 3]).
size mismatch for output_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for output_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for output_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for output_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.11.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 768, 1, 1]).
size mismatch for output_blocks.11.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.12.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for output_blocks.12.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for output_blocks.12.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 768, 3, 3]).
size mismatch for output_blocks.12.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for output_blocks.12.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 768, 1, 1]).
size mismatch for output_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for output_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for output_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for output_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for output_blocks.15.0.skip_connection.weight: copying a param with shape torch.Size([256, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for output_blocks.16.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.16.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.16.0.in_layers.2.weight: copying a param with shape torch.Size([128, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for output_blocks.16.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.16.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for output_blocks.16.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.16.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.16.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.16.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for output_blocks.16.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.16.0.skip_connection.weight: copying a param with shape torch.Size([128, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for output_blocks.16.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for output_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
size mismatch for output_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for output_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.17.0.skip_connection.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for output_blocks.17.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for out.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for out.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([6, 256, 3, 3]).
size mismatch for out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([6]).
Additionally, here is the script used to train the diffusion model:
train.sh
#!/bin/bash
MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 3 --class_cond True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
TRAIN_FLAGS="--lr 1e-4 --batch_size 4"
python scripts/image_train.py \
--data_dir /path/to/improved_diffusion_data \
$MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS
@Mitchnoff, I get the same error. Could you let me know if you found a solution?
@Mitchnoff, I get the same error. Could you let me know if you found a solution?
I've tried working on 64x64 images to start but am still facing the same issue. I will definitely update this if a solution is found. What happened on your side to get the same results? Were you trying to train on your own dataset?
@Mitchnoff, I get the same error. Have you found a solution?
I get the same error. Have you found a solution?
@shengshneg123 @HioZx @Mitchnoff @anicej has anyone found the solution yet?