flux icon indicating copy to clipboard operation
flux copied to clipboard

Mismatch between model and checkpoint

Open Kaiwen-Zhu opened this issue 1 year ago • 0 comments

I tried to load the VAE, only to find many missing and unexpected parameter keys. I notice that there seems to be a one-to-one correspondence between the missing and unexpected keys (e.g., unexpected encoder.down_blocks.0.downsamplers.0.conv.bias corresponds to missing encoder.down.0.downsample.conv.bias). However, when I map the keys manually, there is still mismatch for parameter sizes. The problem exists for both FLUX.1 [schnell] and FLUX.1 [dev]. Is this due to my improper use or version issues, or other problems? Thank you!

Minimal reproducible code

export AE=<path to AE checkpoint downloaded from https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/vae/diffusion_pytorch_model.safetensors>
import torch
from flux.util import load_ae

device = torch.device('cuda')
ae = load_ae("flux-schnell", device=device)
236 missing keys

encoder.down.0.block.0.norm1.weight encoder.down.0.block.0.norm1.bias encoder.down.0.block.0.conv1.weight encoder.down.0.block.0.conv1.bias encoder.down.0.block.0.norm2.weight encoder.down.0.block.0.norm2.bias encoder.down.0.block.0.conv2.weight encoder.down.0.block.0.conv2.bias encoder.down.0.block.1.norm1.weight encoder.down.0.block.1.norm1.bias encoder.down.0.block.1.conv1.weight encoder.down.0.block.1.conv1.bias encoder.down.0.block.1.norm2.weight encoder.down.0.block.1.norm2.bias encoder.down.0.block.1.conv2.weight encoder.down.0.block.1.conv2.bias encoder.down.0.downsample.conv.weight encoder.down.0.downsample.conv.bias encoder.down.1.block.0.norm1.weight encoder.down.1.block.0.norm1.bias encoder.down.1.block.0.conv1.weight encoder.down.1.block.0.conv1.bias encoder.down.1.block.0.norm2.weight encoder.down.1.block.0.norm2.bias encoder.down.1.block.0.conv2.weight encoder.down.1.block.0.conv2.bias encoder.down.1.block.0.nin_shortcut.weight encoder.down.1.block.0.nin_shortcut.bias encoder.down.1.block.1.norm1.weight encoder.down.1.block.1.norm1.bias encoder.down.1.block.1.conv1.weight encoder.down.1.block.1.conv1.bias encoder.down.1.block.1.norm2.weight encoder.down.1.block.1.norm2.bias encoder.down.1.block.1.conv2.weight encoder.down.1.block.1.conv2.bias encoder.down.1.downsample.conv.weight encoder.down.1.downsample.conv.bias encoder.down.2.block.0.norm1.weight encoder.down.2.block.0.norm1.bias encoder.down.2.block.0.conv1.weight encoder.down.2.block.0.conv1.bias encoder.down.2.block.0.norm2.weight encoder.down.2.block.0.norm2.bias encoder.down.2.block.0.conv2.weight encoder.down.2.block.0.conv2.bias encoder.down.2.block.0.nin_shortcut.weight encoder.down.2.block.0.nin_shortcut.bias encoder.down.2.block.1.norm1.weight encoder.down.2.block.1.norm1.bias encoder.down.2.block.1.conv1.weight encoder.down.2.block.1.conv1.bias encoder.down.2.block.1.norm2.weight encoder.down.2.block.1.norm2.bias encoder.down.2.block.1.conv2.weight encoder.down.2.block.1.conv2.bias encoder.down.2.downsample.conv.weight encoder.down.2.downsample.conv.bias encoder.down.3.block.0.norm1.weight encoder.down.3.block.0.norm1.bias encoder.down.3.block.0.conv1.weight encoder.down.3.block.0.conv1.bias encoder.down.3.block.0.norm2.weight encoder.down.3.block.0.norm2.bias encoder.down.3.block.0.conv2.weight encoder.down.3.block.0.conv2.bias encoder.down.3.block.1.norm1.weight encoder.down.3.block.1.norm1.bias encoder.down.3.block.1.conv1.weight encoder.down.3.block.1.conv1.bias encoder.down.3.block.1.norm2.weight encoder.down.3.block.1.norm2.bias encoder.down.3.block.1.conv2.weight encoder.down.3.block.1.conv2.bias encoder.mid.block_1.norm1.weight encoder.mid.block_1.norm1.bias encoder.mid.block_1.conv1.weight encoder.mid.block_1.conv1.bias encoder.mid.block_1.norm2.weight encoder.mid.block_1.norm2.bias encoder.mid.block_1.conv2.weight encoder.mid.block_1.conv2.bias encoder.mid.attn_1.norm.weight encoder.mid.attn_1.norm.bias encoder.mid.attn_1.q.weight encoder.mid.attn_1.q.bias encoder.mid.attn_1.k.weight encoder.mid.attn_1.k.bias encoder.mid.attn_1.v.weight encoder.mid.attn_1.v.bias encoder.mid.attn_1.proj_out.weight encoder.mid.attn_1.proj_out.bias encoder.mid.block_2.norm1.weight encoder.mid.block_2.norm1.bias encoder.mid.block_2.conv1.weight encoder.mid.block_2.conv1.bias encoder.mid.block_2.norm2.weight encoder.mid.block_2.norm2.bias encoder.mid.block_2.conv2.weight encoder.mid.block_2.conv2.bias encoder.norm_out.weight encoder.norm_out.bias decoder.mid.block_1.norm1.weight decoder.mid.block_1.norm1.bias decoder.mid.block_1.conv1.weight decoder.mid.block_1.conv1.bias decoder.mid.block_1.norm2.weight decoder.mid.block_1.norm2.bias decoder.mid.block_1.conv2.weight decoder.mid.block_1.conv2.bias decoder.mid.attn_1.norm.weight decoder.mid.attn_1.norm.bias decoder.mid.attn_1.q.weight decoder.mid.attn_1.q.bias decoder.mid.attn_1.k.weight decoder.mid.attn_1.k.bias decoder.mid.attn_1.v.weight decoder.mid.attn_1.v.bias decoder.mid.attn_1.proj_out.weight decoder.mid.attn_1.proj_out.bias decoder.mid.block_2.norm1.weight decoder.mid.block_2.norm1.bias decoder.mid.block_2.conv1.weight decoder.mid.block_2.conv1.bias decoder.mid.block_2.norm2.weight decoder.mid.block_2.norm2.bias decoder.mid.block_2.conv2.weight decoder.mid.block_2.conv2.bias decoder.up.0.block.0.norm1.weight decoder.up.0.block.0.norm1.bias decoder.up.0.block.0.conv1.weight decoder.up.0.block.0.conv1.bias decoder.up.0.block.0.norm2.weight decoder.up.0.block.0.norm2.bias decoder.up.0.block.0.conv2.weight decoder.up.0.block.0.conv2.bias decoder.up.0.block.0.nin_shortcut.weight decoder.up.0.block.0.nin_shortcut.bias decoder.up.0.block.1.norm1.weight decoder.up.0.block.1.norm1.bias decoder.up.0.block.1.conv1.weight decoder.up.0.block.1.conv1.bias decoder.up.0.block.1.norm2.weight decoder.up.0.block.1.norm2.bias decoder.up.0.block.1.conv2.weight decoder.up.0.block.1.conv2.bias decoder.up.0.block.2.norm1.weight decoder.up.0.block.2.norm1.bias decoder.up.0.block.2.conv1.weight decoder.up.0.block.2.conv1.bias decoder.up.0.block.2.norm2.weight decoder.up.0.block.2.norm2.bias decoder.up.0.block.2.conv2.weight decoder.up.0.block.2.conv2.bias decoder.up.1.block.0.norm1.weight decoder.up.1.block.0.norm1.bias decoder.up.1.block.0.conv1.weight decoder.up.1.block.0.conv1.bias decoder.up.1.block.0.norm2.weight decoder.up.1.block.0.norm2.bias decoder.up.1.block.0.conv2.weight decoder.up.1.block.0.conv2.bias decoder.up.1.block.0.nin_shortcut.weight decoder.up.1.block.0.nin_shortcut.bias decoder.up.1.block.1.norm1.weight decoder.up.1.block.1.norm1.bias decoder.up.1.block.1.conv1.weight decoder.up.1.block.1.conv1.bias decoder.up.1.block.1.norm2.weight decoder.up.1.block.1.norm2.bias decoder.up.1.block.1.conv2.weight decoder.up.1.block.1.conv2.bias decoder.up.1.block.2.norm1.weight decoder.up.1.block.2.norm1.bias decoder.up.1.block.2.conv1.weight decoder.up.1.block.2.conv1.bias decoder.up.1.block.2.norm2.weight decoder.up.1.block.2.norm2.bias decoder.up.1.block.2.conv2.weight decoder.up.1.block.2.conv2.bias decoder.up.1.upsample.conv.weight decoder.up.1.upsample.conv.bias decoder.up.2.block.0.norm1.weight decoder.up.2.block.0.norm1.bias decoder.up.2.block.0.conv1.weight decoder.up.2.block.0.conv1.bias decoder.up.2.block.0.norm2.weight decoder.up.2.block.0.norm2.bias decoder.up.2.block.0.conv2.weight decoder.up.2.block.0.conv2.bias decoder.up.2.block.1.norm1.weight decoder.up.2.block.1.norm1.bias decoder.up.2.block.1.conv1.weight decoder.up.2.block.1.conv1.bias decoder.up.2.block.1.norm2.weight decoder.up.2.block.1.norm2.bias decoder.up.2.block.1.conv2.weight decoder.up.2.block.1.conv2.bias decoder.up.2.block.2.norm1.weight decoder.up.2.block.2.norm1.bias decoder.up.2.block.2.conv1.weight decoder.up.2.block.2.conv1.bias decoder.up.2.block.2.norm2.weight decoder.up.2.block.2.norm2.bias decoder.up.2.block.2.conv2.weight decoder.up.2.block.2.conv2.bias decoder.up.2.upsample.conv.weight decoder.up.2.upsample.conv.bias decoder.up.3.block.0.norm1.weight decoder.up.3.block.0.norm1.bias decoder.up.3.block.0.conv1.weight decoder.up.3.block.0.conv1.bias decoder.up.3.block.0.norm2.weight decoder.up.3.block.0.norm2.bias decoder.up.3.block.0.conv2.weight decoder.up.3.block.0.conv2.bias decoder.up.3.block.1.norm1.weight decoder.up.3.block.1.norm1.bias decoder.up.3.block.1.conv1.weight decoder.up.3.block.1.conv1.bias decoder.up.3.block.1.norm2.weight decoder.up.3.block.1.norm2.bias decoder.up.3.block.1.conv2.weight decoder.up.3.block.1.conv2.bias decoder.up.3.block.2.norm1.weight decoder.up.3.block.2.norm1.bias decoder.up.3.block.2.conv1.weight decoder.up.3.block.2.conv1.bias decoder.up.3.block.2.norm2.weight decoder.up.3.block.2.norm2.bias decoder.up.3.block.2.conv2.weight decoder.up.3.block.2.conv2.bias decoder.up.3.upsample.conv.weight decoder.up.3.upsample.conv.bias decoder.norm_out.weight decoder.norm_out.bias

236 unexpected keys

encoder.conv_norm_out.bias encoder.conv_norm_out.weight encoder.down_blocks.0.downsamplers.0.conv.bias encoder.down_blocks.0.downsamplers.0.conv.weight encoder.down_blocks.0.resnets.0.conv1.bias encoder.down_blocks.0.resnets.0.conv1.weight encoder.down_blocks.0.resnets.0.conv2.bias encoder.down_blocks.0.resnets.0.conv2.weight encoder.down_blocks.0.resnets.0.norm1.bias encoder.down_blocks.0.resnets.0.norm1.weight encoder.down_blocks.0.resnets.0.norm2.bias encoder.down_blocks.0.resnets.0.norm2.weight encoder.down_blocks.0.resnets.1.conv1.bias encoder.down_blocks.0.resnets.1.conv1.weight encoder.down_blocks.0.resnets.1.conv2.bias encoder.down_blocks.0.resnets.1.conv2.weight encoder.down_blocks.0.resnets.1.norm1.bias encoder.down_blocks.0.resnets.1.norm1.weight encoder.down_blocks.0.resnets.1.norm2.bias encoder.down_blocks.0.resnets.1.norm2.weight encoder.down_blocks.1.downsamplers.0.conv.bias encoder.down_blocks.1.downsamplers.0.conv.weight encoder.down_blocks.1.resnets.0.conv1.bias encoder.down_blocks.1.resnets.0.conv1.weight encoder.down_blocks.1.resnets.0.conv2.bias encoder.down_blocks.1.resnets.0.conv2.weight encoder.down_blocks.1.resnets.0.conv_shortcut.bias encoder.down_blocks.1.resnets.0.conv_shortcut.weight encoder.down_blocks.1.resnets.0.norm1.bias encoder.down_blocks.1.resnets.0.norm1.weight encoder.down_blocks.1.resnets.0.norm2.bias encoder.down_blocks.1.resnets.0.norm2.weight encoder.down_blocks.1.resnets.1.conv1.bias encoder.down_blocks.1.resnets.1.conv1.weight encoder.down_blocks.1.resnets.1.conv2.bias encoder.down_blocks.1.resnets.1.conv2.weight encoder.down_blocks.1.resnets.1.norm1.bias encoder.down_blocks.1.resnets.1.norm1.weight encoder.down_blocks.1.resnets.1.norm2.bias encoder.down_blocks.1.resnets.1.norm2.weight encoder.down_blocks.2.downsamplers.0.conv.bias encoder.down_blocks.2.downsamplers.0.conv.weight encoder.down_blocks.2.resnets.0.conv1.bias encoder.down_blocks.2.resnets.0.conv1.weight encoder.down_blocks.2.resnets.0.conv2.bias encoder.down_blocks.2.resnets.0.conv2.weight encoder.down_blocks.2.resnets.0.conv_shortcut.bias encoder.down_blocks.2.resnets.0.conv_shortcut.weight encoder.down_blocks.2.resnets.0.norm1.bias encoder.down_blocks.2.resnets.0.norm1.weight encoder.down_blocks.2.resnets.0.norm2.bias encoder.down_blocks.2.resnets.0.norm2.weight encoder.down_blocks.2.resnets.1.conv1.bias encoder.down_blocks.2.resnets.1.conv1.weight encoder.down_blocks.2.resnets.1.conv2.bias encoder.down_blocks.2.resnets.1.conv2.weight encoder.down_blocks.2.resnets.1.norm1.bias encoder.down_blocks.2.resnets.1.norm1.weight encoder.down_blocks.2.resnets.1.norm2.bias encoder.down_blocks.2.resnets.1.norm2.weight encoder.down_blocks.3.resnets.0.conv1.bias encoder.down_blocks.3.resnets.0.conv1.weight encoder.down_blocks.3.resnets.0.conv2.bias encoder.down_blocks.3.resnets.0.conv2.weight encoder.down_blocks.3.resnets.0.norm1.bias encoder.down_blocks.3.resnets.0.norm1.weight encoder.down_blocks.3.resnets.0.norm2.bias encoder.down_blocks.3.resnets.0.norm2.weight encoder.down_blocks.3.resnets.1.conv1.bias encoder.down_blocks.3.resnets.1.conv1.weight encoder.down_blocks.3.resnets.1.conv2.bias encoder.down_blocks.3.resnets.1.conv2.weight encoder.down_blocks.3.resnets.1.norm1.bias encoder.down_blocks.3.resnets.1.norm1.weight encoder.down_blocks.3.resnets.1.norm2.bias encoder.down_blocks.3.resnets.1.norm2.weight encoder.mid_block.attentions.0.group_norm.bias encoder.mid_block.attentions.0.group_norm.weight encoder.mid_block.attentions.0.to_k.bias encoder.mid_block.attentions.0.to_k.weight encoder.mid_block.attentions.0.to_out.0.bias encoder.mid_block.attentions.0.to_out.0.weight encoder.mid_block.attentions.0.to_q.bias encoder.mid_block.attentions.0.to_q.weight encoder.mid_block.attentions.0.to_v.bias encoder.mid_block.attentions.0.to_v.weight encoder.mid_block.resnets.0.conv1.bias encoder.mid_block.resnets.0.conv1.weight encoder.mid_block.resnets.0.conv2.bias encoder.mid_block.resnets.0.conv2.weight encoder.mid_block.resnets.0.norm1.bias encoder.mid_block.resnets.0.norm1.weight encoder.mid_block.resnets.0.norm2.bias encoder.mid_block.resnets.0.norm2.weight encoder.mid_block.resnets.1.conv1.bias encoder.mid_block.resnets.1.conv1.weight encoder.mid_block.resnets.1.conv2.bias encoder.mid_block.resnets.1.conv2.weight encoder.mid_block.resnets.1.norm1.bias encoder.mid_block.resnets.1.norm1.weight encoder.mid_block.resnets.1.norm2.bias encoder.mid_block.resnets.1.norm2.weight decoder.conv_norm_out.bias decoder.conv_norm_out.weight decoder.mid_block.attentions.0.group_norm.bias decoder.mid_block.attentions.0.group_norm.weight decoder.mid_block.attentions.0.to_k.bias decoder.mid_block.attentions.0.to_k.weight decoder.mid_block.attentions.0.to_out.0.bias decoder.mid_block.attentions.0.to_out.0.weight decoder.mid_block.attentions.0.to_q.bias decoder.mid_block.attentions.0.to_q.weight decoder.mid_block.attentions.0.to_v.bias decoder.mid_block.attentions.0.to_v.weight decoder.mid_block.resnets.0.conv1.bias decoder.mid_block.resnets.0.conv1.weight decoder.mid_block.resnets.0.conv2.bias decoder.mid_block.resnets.0.conv2.weight decoder.mid_block.resnets.0.norm1.bias decoder.mid_block.resnets.0.norm1.weight decoder.mid_block.resnets.0.norm2.bias decoder.mid_block.resnets.0.norm2.weight decoder.mid_block.resnets.1.conv1.bias decoder.mid_block.resnets.1.conv1.weight decoder.mid_block.resnets.1.conv2.bias decoder.mid_block.resnets.1.conv2.weight decoder.mid_block.resnets.1.norm1.bias decoder.mid_block.resnets.1.norm1.weight decoder.mid_block.resnets.1.norm2.bias decoder.mid_block.resnets.1.norm2.weight decoder.up_blocks.0.resnets.0.conv1.bias decoder.up_blocks.0.resnets.0.conv1.weight decoder.up_blocks.0.resnets.0.conv2.bias decoder.up_blocks.0.resnets.0.conv2.weight decoder.up_blocks.0.resnets.0.norm1.bias decoder.up_blocks.0.resnets.0.norm1.weight decoder.up_blocks.0.resnets.0.norm2.bias decoder.up_blocks.0.resnets.0.norm2.weight decoder.up_blocks.0.resnets.1.conv1.bias decoder.up_blocks.0.resnets.1.conv1.weight decoder.up_blocks.0.resnets.1.conv2.bias decoder.up_blocks.0.resnets.1.conv2.weight decoder.up_blocks.0.resnets.1.norm1.bias decoder.up_blocks.0.resnets.1.norm1.weight decoder.up_blocks.0.resnets.1.norm2.bias decoder.up_blocks.0.resnets.1.norm2.weight decoder.up_blocks.0.resnets.2.conv1.bias decoder.up_blocks.0.resnets.2.conv1.weight decoder.up_blocks.0.resnets.2.conv2.bias decoder.up_blocks.0.resnets.2.conv2.weight decoder.up_blocks.0.resnets.2.norm1.bias decoder.up_blocks.0.resnets.2.norm1.weight decoder.up_blocks.0.resnets.2.norm2.bias decoder.up_blocks.0.resnets.2.norm2.weight decoder.up_blocks.0.upsamplers.0.conv.bias decoder.up_blocks.0.upsamplers.0.conv.weight decoder.up_blocks.1.resnets.0.conv1.bias decoder.up_blocks.1.resnets.0.conv1.weight decoder.up_blocks.1.resnets.0.conv2.bias decoder.up_blocks.1.resnets.0.conv2.weight decoder.up_blocks.1.resnets.0.norm1.bias decoder.up_blocks.1.resnets.0.norm1.weight decoder.up_blocks.1.resnets.0.norm2.bias decoder.up_blocks.1.resnets.0.norm2.weight decoder.up_blocks.1.resnets.1.conv1.bias decoder.up_blocks.1.resnets.1.conv1.weight decoder.up_blocks.1.resnets.1.conv2.bias decoder.up_blocks.1.resnets.1.conv2.weight decoder.up_blocks.1.resnets.1.norm1.bias decoder.up_blocks.1.resnets.1.norm1.weight decoder.up_blocks.1.resnets.1.norm2.bias decoder.up_blocks.1.resnets.1.norm2.weight decoder.up_blocks.1.resnets.2.conv1.bias decoder.up_blocks.1.resnets.2.conv1.weight decoder.up_blocks.1.resnets.2.conv2.bias decoder.up_blocks.1.resnets.2.conv2.weight decoder.up_blocks.1.resnets.2.norm1.bias decoder.up_blocks.1.resnets.2.norm1.weight decoder.up_blocks.1.resnets.2.norm2.bias decoder.up_blocks.1.resnets.2.norm2.weight decoder.up_blocks.1.upsamplers.0.conv.bias decoder.up_blocks.1.upsamplers.0.conv.weight decoder.up_blocks.2.resnets.0.conv1.bias decoder.up_blocks.2.resnets.0.conv1.weight decoder.up_blocks.2.resnets.0.conv2.bias decoder.up_blocks.2.resnets.0.conv2.weight decoder.up_blocks.2.resnets.0.conv_shortcut.bias decoder.up_blocks.2.resnets.0.conv_shortcut.weight decoder.up_blocks.2.resnets.0.norm1.bias decoder.up_blocks.2.resnets.0.norm1.weight decoder.up_blocks.2.resnets.0.norm2.bias decoder.up_blocks.2.resnets.0.norm2.weight decoder.up_blocks.2.resnets.1.conv1.bias decoder.up_blocks.2.resnets.1.conv1.weight decoder.up_blocks.2.resnets.1.conv2.bias decoder.up_blocks.2.resnets.1.conv2.weight decoder.up_blocks.2.resnets.1.norm1.bias decoder.up_blocks.2.resnets.1.norm1.weight decoder.up_blocks.2.resnets.1.norm2.bias decoder.up_blocks.2.resnets.1.norm2.weight decoder.up_blocks.2.resnets.2.conv1.bias decoder.up_blocks.2.resnets.2.conv1.weight decoder.up_blocks.2.resnets.2.conv2.bias decoder.up_blocks.2.resnets.2.conv2.weight decoder.up_blocks.2.resnets.2.norm1.bias decoder.up_blocks.2.resnets.2.norm1.weight decoder.up_blocks.2.resnets.2.norm2.bias decoder.up_blocks.2.resnets.2.norm2.weight decoder.up_blocks.2.upsamplers.0.conv.bias decoder.up_blocks.2.upsamplers.0.conv.weight decoder.up_blocks.3.resnets.0.conv1.bias decoder.up_blocks.3.resnets.0.conv1.weight decoder.up_blocks.3.resnets.0.conv2.bias decoder.up_blocks.3.resnets.0.conv2.weight decoder.up_blocks.3.resnets.0.conv_shortcut.bias decoder.up_blocks.3.resnets.0.conv_shortcut.weight decoder.up_blocks.3.resnets.0.norm1.bias decoder.up_blocks.3.resnets.0.norm1.weight decoder.up_blocks.3.resnets.0.norm2.bias decoder.up_blocks.3.resnets.0.norm2.weight decoder.up_blocks.3.resnets.1.conv1.bias decoder.up_blocks.3.resnets.1.conv1.weight decoder.up_blocks.3.resnets.1.conv2.bias decoder.up_blocks.3.resnets.1.conv2.weight decoder.up_blocks.3.resnets.1.norm1.bias decoder.up_blocks.3.resnets.1.norm1.weight decoder.up_blocks.3.resnets.1.norm2.bias decoder.up_blocks.3.resnets.1.norm2.weight decoder.up_blocks.3.resnets.2.conv1.bias decoder.up_blocks.3.resnets.2.conv1.weight decoder.up_blocks.3.resnets.2.conv2.bias decoder.up_blocks.3.resnets.2.conv2.weight decoder.up_blocks.3.resnets.2.norm1.bias decoder.up_blocks.3.resnets.2.norm1.weight decoder.up_blocks.3.resnets.2.norm2.bias decoder.up_blocks.3.resnets.2.norm2.weight

Kaiwen-Zhu avatar Nov 02 '24 08:11 Kaiwen-Zhu