guided-diffusion icon indicating copy to clipboard operation
guided-diffusion copied to clipboard

Apparent Architecture Mismatch when training on custom data

Open Mitchnoff opened this issue 1 year ago • 8 comments

I am attempting to train a guided diffusion model on my own custom training data. I have successfully trained a model using the improved-diffusion github on my custom data and was able to sample images that fit the training data. I am now attempting to train a classifer on the same data and use it for the guided diffusion process. When I run my shell script, however, I get the following error:

Error Message
creating model and diffusion...
Traceback (most recent call last):
  File "scripts/classifier_sample.py", line 134, in <module>
    main()
  File "scripts/classifier_sample.py", line 38, in main
    model.load_state_dict(
  File "/home/allenm/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNetModel:
        Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias", "input_blocks.4.0.emb_layers.1.weight", "input_blocks.4.0.emb_layers.1.bias", "input_blocks.4.0.out_layers.0.weight", "input_blocks.4.0.out_layers.0.bias", "input_blocks.4.0.out_layers.3.weight", "input_blocks.4.0.out_layers.3.bias", "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.8.0.in_layers.0.weight", "input_blocks.8.0.in_layers.0.bias", "input_blocks.8.0.in_layers.2.weight", "input_blocks.8.0.in_layers.2.bias", "input_blocks.8.0.emb_layers.1.weight", "input_blocks.8.0.emb_layers.1.bias", "input_blocks.8.0.out_layers.0.weight", "input_blocks.8.0.out_layers.0.bias", "input_blocks.8.0.out_layers.3.weight", "input_blocks.8.0.out_layers.3.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.16.0.in_layers.0.weight", "input_blocks.16.0.in_layers.0.bias", "input_blocks.16.0.in_layers.2.weight", "input_blocks.16.0.in_layers.2.bias", "input_blocks.16.0.emb_layers.1.weight", "input_blocks.16.0.emb_layers.1.bias", "input_blocks.16.0.out_layers.0.weight", "input_blocks.16.0.out_layers.0.bias", "input_blocks.16.0.out_layers.3.weight", "input_blocks.16.0.out_layers.3.bias", "input_blocks.16.1.norm.weight", "input_blocks.16.1.norm.bias", "input_blocks.16.1.qkv.weight", "input_blocks.16.1.qkv.bias", "input_blocks.16.1.proj_out.weight", "input_blocks.16.1.proj_out.bias", 
...
"output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias", 
...
"output_blocks.23.0.out_layers.0.bias", "output_blocks.23.0.out_layers.3.weight", "output_blocks.23.0.out_layers.3.bias", "output_blocks.23.0.skip_connection.weight", "output_blocks.23.0.skip_connection.bias", "output_blocks.3.2.conv.weight", "output_blocks.3.2.conv.bias", "output_blocks.7.2.conv.weight", "output_blocks.7.2.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.15.1.conv.weight", "output_blocks.15.1.conv.bias".
        size mismatch for time_embed.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
        size mismatch for time_embed.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for time_embed.2.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for time_embed.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for label_emb.weight: copying a param with shape torch.Size([1000, 512]) from checkpoint, the shape in current model is torch.Size([1000, 1024]).
        size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 3, 3, 3]).
        size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
...
        size mismatch for input_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for input_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
        size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for input_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for input_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for input_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.13.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.13.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.13.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3, 3]).
        size mismatch for input_blocks.13.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for input_blocks.13.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for input_blocks.13.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.13.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.13.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.13.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.14.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for input_blocks.14.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for input_blocks.14.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.14.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.15.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for input_blocks.15.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for input_blocks.15.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.15.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for input_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for input_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for input_blocks.17.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for input_blocks.17.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for input_blocks.17.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for middle_block.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for middle_block.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for middle_block.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
        size mismatch for output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.0.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.0.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.0.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
        size mismatch for output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.1.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.1.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
        size mismatch for output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.2.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.2.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in 
...
checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
        size mismatch for output_blocks.17.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for out.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for out.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([6, 256, 3, 3]).
        size mismatch for out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([6]).

The scripts used for sampling, training the diffusion model, and training the classifier

This is the script that fails and gets the message above.

Image_sample.sh
#!/bin/bash

MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 256 --learn_sigma True --num_channels 256 --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 4 --classifier_width 32 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True"
# CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True"
SAMPLE_FLAGS="--batch_size 4 --num_samples 50000 --timestep_respacing ddim25 --use_ddim True"

export OPENAI_LOGDIR="~/diffusion/guided-diffusion/outputs/first_output"

# fix the model path and classifier path

python scripts/classifier_sample.py \
    --model_path ~/path/to/model010000.pt \
    --classifier_path ~/path/to/model020000.pt \
    $MODEL_FLAGS $CLASSIFIER_FLAGS $SAMPLE_FLAGS
</details>

This is the script used to generate the diffusion model:
<details>
<summary>train.sh</summary>
#!/bin/bash

MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 3 --class_cond True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
TRAIN_FLAGS="--lr 1e-4 --batch_size 4"

export OPENAI_LOGDIR="~/diffusion/improved-diffusion/training_logs/256_classcond/"

python scripts/image_train.py \
    --data_dir /data/path/to/improved_diffusion_data \
    $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS
    

This is the script used to generate the classifier:

classifier_train.sh
#!/bin/bash

TRAIN_FLAGS="--iterations 300000 --anneal_lr True --batch_size 4 --lr 3e-4 --save_interval 10000 --weight_decay 0.05"
CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 32 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"

export OPENAI_LOGDIR="~/diffusion/guided-diffusion/trained_classifiers/256_class_test"

python scripts/classifier_train.py \
    --data_dir /data/ur/berisha/mitch/improved_diffusion_data \
    $TRAIN_FLAGS $CLASSIFIER_FLAGS

Based on the error message I believe it has to do with mismatched architecture. I am attempting to retrain with different hyperparemetrs but am uncertain which ones could be causing this problem. Admittedly I am not sure that this is even the cause, so it very well may be something else causing this problem.

Any pointers in the right direction would be greatly appreciated. I am reading over the paper again and watching some videos to see if that sheds some light as to how this problem is occuring. IF any more information is needed to help let me know!

Mitchnoff avatar Mar 13 '23 23:03 Mitchnoff

Due to the character limit I had to remove a lot of the error message. It is posted in its entirety here in case it is needed:

Full Error Message
creating model and diffusion...
Traceback (most recent call last):
  File "scripts/classifier_sample.py", line 134, in <module>
    main()
  File "scripts/classifier_sample.py", line 38, in main
    model.load_state_dict(
  File "/home/allenm/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNetModel:
        Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias", "input_blocks.4.0.emb_layers.1.weight", "input_blocks.4.0.emb_layers.1.bias", "input_blocks.4.0.out_layers.0.weight", "input_blocks.4.0.out_layers.0.bias", "input_blocks.4.0.out_layers.3.weight", "input_blocks.4.0.out_layers.3.bias", "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.8.0.in_layers.0.weight", "input_blocks.8.0.in_layers.0.bias", "input_blocks.8.0.in_layers.2.weight", "input_blocks.8.0.in_layers.2.bias", "input_blocks.8.0.emb_layers.1.weight", "input_blocks.8.0.emb_layers.1.bias", "input_blocks.8.0.out_layers.0.weight", "input_blocks.8.0.out_layers.0.bias", "input_blocks.8.0.out_layers.3.weight", "input_blocks.8.0.out_layers.3.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.16.0.in_layers.0.weight", "input_blocks.16.0.in_layers.0.bias", "input_blocks.16.0.in_layers.2.weight", "input_blocks.16.0.in_layers.2.bias", "input_blocks.16.0.emb_layers.1.weight", "input_blocks.16.0.emb_layers.1.bias", "input_blocks.16.0.out_layers.0.weight", "input_blocks.16.0.out_layers.0.bias", "input_blocks.16.0.out_layers.3.weight", "input_blocks.16.0.out_layers.3.bias", "input_blocks.16.1.norm.weight", "input_blocks.16.1.norm.bias", "input_blocks.16.1.qkv.weight", "input_blocks.16.1.qkv.bias", "input_blocks.16.1.proj_out.weight", "input_blocks.16.1.proj_out.bias", "output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias", "output_blocks.8.2.out_layers.3.weight", "output_blocks.8.2.out_layers.3.bias", "output_blocks.11.1.in_layers.0.weight", "output_blocks.11.1.in_layers.0.bias", "output_blocks.11.1.in_layers.2.weight", "output_blocks.11.1.in_layers.2.bias", "output_blocks.11.1.emb_layers.1.weight", "output_blocks.11.1.emb_layers.1.bias", "output_blocks.11.1.out_layers.0.weight", "output_blocks.11.1.out_layers.0.bias", "output_blocks.11.1.out_layers.3.weight", "output_blocks.11.1.out_layers.3.bias", "output_blocks.14.1.in_layers.0.weight", "output_blocks.14.1.in_layers.0.bias", "output_blocks.14.1.in_layers.2.weight", "output_blocks.14.1.in_layers.2.bias", "output_blocks.14.1.emb_layers.1.weight", "output_blocks.14.1.emb_layers.1.bias", "output_blocks.14.1.out_layers.0.weight", "output_blocks.14.1.out_layers.0.bias", "output_blocks.14.1.out_layers.3.weight", "output_blocks.14.1.out_layers.3.bias".
        Unexpected key(s) in state_dict: "input_blocks.18.0.in_layers.0.weight", "input_blocks.18.0.in_layers.0.bias", "input_blocks.18.0.in_layers.2.weight", "input_blocks.18.0.in_layers.2.bias", "input_blocks.18.0.emb_layers.1.weight", "input_blocks.18.0.emb_layers.1.bias", "input_blocks.18.0.out_layers.0.weight", "input_blocks.18.0.out_layers.0.bias", "input_blocks.18.0.out_layers.3.weight", "input_blocks.18.0.out_layers.3.bias", "input_blocks.18.1.norm.weight", "input_blocks.18.1.norm.bias", "input_blocks.18.1.qkv.weight", "input_blocks.18.1.qkv.bias", "input_blocks.18.1.proj_out.weight", "input_blocks.18.1.proj_out.bias", "input_blocks.19.0.in_layers.0.weight", "input_blocks.19.0.in_layers.0.bias", "input_blocks.19.0.in_layers.2.weight", "input_blocks.19.0.in_layers.2.bias", "input_blocks.19.0.emb_layers.1.weight", "input_blocks.19.0.emb_layers.1.bias", "input_blocks.19.0.out_layers.0.weight", "input_blocks.19.0.out_layers.0.bias", "input_blocks.19.0.out_layers.3.weight", "input_blocks.19.0.out_layers.3.bias", "input_blocks.19.1.norm.weight", "input_blocks.19.1.norm.bias", "input_blocks.19.1.qkv.weight", "input_blocks.19.1.qkv.bias", "input_blocks.19.1.proj_out.weight", "input_blocks.19.1.proj_out.bias", "input_blocks.20.0.op.weight", "input_blocks.20.0.op.bias", "input_blocks.21.0.in_layers.0.weight", "input_blocks.21.0.in_layers.0.bias", "input_blocks.21.0.in_layers.2.weight", "input_blocks.21.0.in_layers.2.bias", "input_blocks.21.0.emb_layers.1.weight", "input_blocks.21.0.emb_layers.1.bias", "input_blocks.21.0.out_layers.0.weight", "input_blocks.21.0.out_layers.0.bias", "input_blocks.21.0.out_layers.3.weight", "input_blocks.21.0.out_layers.3.bias", "input_blocks.21.1.norm.weight", "input_blocks.21.1.norm.bias", "input_blocks.21.1.qkv.weight", "input_blocks.21.1.qkv.bias", "input_blocks.21.1.proj_out.weight", "input_blocks.21.1.proj_out.bias", "input_blocks.22.0.in_layers.0.weight", "input_blocks.22.0.in_layers.0.bias", "input_blocks.22.0.in_layers.2.weight", "input_blocks.22.0.in_layers.2.bias", "input_blocks.22.0.emb_layers.1.weight", "input_blocks.22.0.emb_layers.1.bias", "input_blocks.22.0.out_layers.0.weight", "input_blocks.22.0.out_layers.0.bias", "input_blocks.22.0.out_layers.3.weight", "input_blocks.22.0.out_layers.3.bias", "input_blocks.22.1.norm.weight", "input_blocks.22.1.norm.bias", "input_blocks.22.1.qkv.weight", "input_blocks.22.1.qkv.bias", "input_blocks.22.1.proj_out.weight", "input_blocks.22.1.proj_out.bias", "input_blocks.23.0.in_layers.0.weight", "input_blocks.23.0.in_layers.0.bias", "input_blocks.23.0.in_layers.2.weight", "input_blocks.23.0.in_layers.2.bias", "input_blocks.23.0.emb_layers.1.weight", "input_blocks.23.0.emb_layers.1.bias", "input_blocks.23.0.out_layers.0.weight", "input_blocks.23.0.out_layers.0.bias", "input_blocks.23.0.out_layers.3.weight", "input_blocks.23.0.out_layers.3.bias", "input_blocks.23.1.norm.weight", "input_blocks.23.1.norm.bias", "input_blocks.23.1.qkv.weight", "input_blocks.23.1.qkv.bias", "input_blocks.23.1.proj_out.weight", "input_blocks.23.1.proj_out.bias", "input_blocks.4.0.op.weight", "input_blocks.4.0.op.bias", "input_blocks.8.0.op.weight", "input_blocks.8.0.op.bias", "input_blocks.9.0.skip_connection.weight", "input_blocks.9.0.skip_connection.bias", "input_blocks.12.0.op.weight", "input_blocks.12.0.op.bias", "input_blocks.16.0.op.weight", "input_blocks.16.0.op.bias", "input_blocks.17.0.skip_connection.weight", "input_blocks.17.0.skip_connection.bias", "output_blocks.18.0.in_layers.0.weight", "output_blocks.18.0.in_layers.0.bias", "output_blocks.18.0.in_layers.2.weight", "output_blocks.18.0.in_layers.2.bias", "output_blocks.18.0.emb_layers.1.weight", "output_blocks.18.0.emb_layers.1.bias", "output_blocks.18.0.out_layers.0.weight", "output_blocks.18.0.out_layers.0.bias", "output_blocks.18.0.out_layers.3.weight", "output_blocks.18.0.out_layers.3.bias", "output_blocks.18.0.skip_connection.weight", "output_blocks.18.0.skip_connection.bias", "output_blocks.19.0.in_layers.0.weight", "output_blocks.19.0.in_layers.0.bias", "output_blocks.19.0.in_layers.2.weight", "output_blocks.19.0.in_layers.2.bias", "output_blocks.19.0.emb_layers.1.weight", "output_blocks.19.0.emb_layers.1.bias", "output_blocks.19.0.out_layers.0.weight", "output_blocks.19.0.out_layers.0.bias", "output_blocks.19.0.out_layers.3.weight", "output_blocks.19.0.out_layers.3.bias", "output_blocks.19.0.skip_connection.weight", "output_blocks.19.0.skip_connection.bias", "output_blocks.19.1.conv.weight", "output_blocks.19.1.conv.bias", "output_blocks.20.0.in_layers.0.weight", "output_blocks.20.0.in_layers.0.bias", "output_blocks.20.0.in_layers.2.weight", "output_blocks.20.0.in_layers.2.bias", "output_blocks.20.0.emb_layers.1.weight", "output_blocks.20.0.emb_layers.1.bias", "output_blocks.20.0.out_layers.0.weight", "output_blocks.20.0.out_layers.0.bias", "output_blocks.20.0.out_layers.3.weight", "output_blocks.20.0.out_layers.3.bias", "output_blocks.20.0.skip_connection.weight", "output_blocks.20.0.skip_connection.bias", "output_blocks.21.0.in_layers.0.weight", "output_blocks.21.0.in_layers.0.bias", "output_blocks.21.0.in_layers.2.weight", "output_blocks.21.0.in_layers.2.bias", "output_blocks.21.0.emb_layers.1.weight", "output_blocks.21.0.emb_layers.1.bias", "output_blocks.21.0.out_layers.0.weight", "output_blocks.21.0.out_layers.0.bias", "output_blocks.21.0.out_layers.3.weight", "output_blocks.21.0.out_layers.3.bias", "output_blocks.21.0.skip_connection.weight", "output_blocks.21.0.skip_connection.bias", "output_blocks.22.0.in_layers.0.weight", "output_blocks.22.0.in_layers.0.bias", "output_blocks.22.0.in_layers.2.weight", "output_blocks.22.0.in_layers.2.bias", "output_blocks.22.0.emb_layers.1.weight", "output_blocks.22.0.emb_layers.1.bias", "output_blocks.22.0.out_layers.0.weight", "output_blocks.22.0.out_layers.0.bias", "output_blocks.22.0.out_layers.3.weight", "output_blocks.22.0.out_layers.3.bias", "output_blocks.22.0.skip_connection.weight", "output_blocks.22.0.skip_connection.bias", "output_blocks.23.0.in_layers.0.weight", "output_blocks.23.0.in_layers.0.bias", "output_blocks.23.0.in_layers.2.weight", "output_blocks.23.0.in_layers.2.bias", "output_blocks.23.0.emb_layers.1.weight", "output_blocks.23.0.emb_layers.1.bias", "output_blocks.23.0.out_layers.0.weight", "output_blocks.23.0.out_layers.0.bias", "output_blocks.23.0.out_layers.3.weight", "output_blocks.23.0.out_layers.3.bias", "output_blocks.23.0.skip_connection.weight", "output_blocks.23.0.skip_connection.bias", "output_blocks.3.2.conv.weight", "output_blocks.3.2.conv.bias", "output_blocks.7.2.conv.weight", "output_blocks.7.2.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.15.1.conv.weight", "output_blocks.15.1.conv.bias".
        size mismatch for time_embed.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
        size mismatch for time_embed.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for time_embed.2.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for time_embed.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for label_emb.weight: copying a param with shape torch.Size([1000, 512]) from checkpoint, the shape in current model is torch.Size([1000, 1024]).
        size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 3, 3, 3]).
        size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for input_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for input_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for input_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for input_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for input_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
        size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for input_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for input_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for input_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for input_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.13.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.13.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for input_blocks.13.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3, 3]).
        size mismatch for input_blocks.13.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for input_blocks.13.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for input_blocks.13.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.13.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.13.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.13.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.14.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for input_blocks.14.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for input_blocks.14.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.14.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.14.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.15.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for input_blocks.15.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for input_blocks.15.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.15.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.15.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for input_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for input_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for input_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for input_blocks.17.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for input_blocks.17.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for input_blocks.17.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for input_blocks.17.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for middle_block.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for middle_block.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for middle_block.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
        size mismatch for output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.0.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.0.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.0.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.0.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
        size mismatch for output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.1.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.1.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.1.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
        size mismatch for output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.2.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.2.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.2.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
        size mismatch for output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.3.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.3.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.3.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]).
        size mismatch for output_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]).
        size mismatch for output_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.4.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.4.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.4.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.4.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.4.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1536, 3, 3]).
        size mismatch for output_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
        size mismatch for output_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]).
        size mismatch for output_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]).
        size mismatch for output_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1536, 1, 1]).
        size mismatch for output_blocks.5.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.5.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.5.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.5.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]).
        size mismatch for output_blocks.5.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]).
        size mismatch for output_blocks.5.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1536, 3, 3]).
        size mismatch for output_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1536, 1, 1]).
        size mismatch for output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([512, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).
        size mismatch for output_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([512, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
        size mismatch for output_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([256, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).
        size mismatch for output_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for output_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for output_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([256, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
        size mismatch for output_blocks.8.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).
        size mismatch for output_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for output_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for output_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.9.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
        size mismatch for output_blocks.9.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]).
        size mismatch for output_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for output_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for output_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.10.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]).
        size mismatch for output_blocks.10.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for output_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for output_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 768, 3, 3]).
        size mismatch for output_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
        size mismatch for output_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
        size mismatch for output_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for output_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.11.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 768, 1, 1]).
        size mismatch for output_blocks.11.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.12.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for output_blocks.12.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for output_blocks.12.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 768, 3, 3]).
        size mismatch for output_blocks.12.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for output_blocks.12.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 768, 1, 1]).
        size mismatch for output_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for output_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for output_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
        size mismatch for output_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for output_blocks.15.0.skip_connection.weight: copying a param with shape torch.Size([256, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
        size mismatch for output_blocks.16.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.16.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.16.0.in_layers.2.weight: copying a param with shape torch.Size([128, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
        size mismatch for output_blocks.16.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.16.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for output_blocks.16.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.16.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.16.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.16.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for output_blocks.16.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.16.0.skip_connection.weight: copying a param with shape torch.Size([128, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
        size mismatch for output_blocks.16.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
        size mismatch for output_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for output_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for output_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for output_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for output_blocks.17.0.skip_connection.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
        size mismatch for output_blocks.17.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for out.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for out.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([6, 256, 3, 3]).
        size mismatch for out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([6]).

Mitchnoff avatar Mar 13 '23 23:03 Mitchnoff

Additionally, here is the script used to train the diffusion model:

train.sh
#!/bin/bash

MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 3 --class_cond True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
TRAIN_FLAGS="--lr 1e-4 --batch_size 4"

python scripts/image_train.py \
    --data_dir /path/to/improved_diffusion_data \
    $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

Mitchnoff avatar Mar 13 '23 23:03 Mitchnoff

@Mitchnoff, I get the same error. Could you let me know if you found a solution?

anicej avatar Mar 21 '23 10:03 anicej

@Mitchnoff, I get the same error. Could you let me know if you found a solution?

I've tried working on 64x64 images to start but am still facing the same issue. I will definitely update this if a solution is found. What happened on your side to get the same results? Were you trying to train on your own dataset?

Mitchnoff avatar Mar 21 '23 18:03 Mitchnoff

@Mitchnoff, I get the same error. Have you found a solution?

HioZx avatar Apr 25 '23 12:04 HioZx

I get the same error. Have you found a solution?

shengshneg123 avatar Jul 18 '23 01:07 shengshneg123

@shengshneg123 @HioZx @Mitchnoff @anicej has anyone found the solution yet?

sushilkhadkaanon avatar Feb 11 '24 18:02 sushilkhadkaanon