Help SAT model convert to diffuser model
fully fintuned model & Lora fintuned model cannot convert to diffuser model
The only difference between the commands that generate Convert Log1 and Convert Log2 is the transformer_ckpt_path.
Log1 transformer_ckpt_path is Original SAT model path
Log2 transformer_ckpt_path is full fine tuned model path
Convert Log 1 (Original 2B SAT model to diffuser model)
(cogvideo) root@alphacode-ttv-a100-80g-gpu:~/Ori_CogVideo/tools# python convert_weight_sat2hf.py --transformer_ckpt_path /root/Ori_CogVideo/CogVideoX-2b-sat/transformer/1000/mp_rank_00_model_states.pt --vae_ckpt_path /root/Ori_CogVideo/CogVideoX-2b-sat/vae/3d-vae.pt --output_path /root/Ori_CogVideo/sat/ckpts_2b/lora-disney-09-12-14-52/50-ema --text_encoder_cache_dir /root/Ori_CogVideo/t5-v1_1-xxl
/root/miniconda3/envs/cogvideo/lib/python3.12/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
/root/miniconda3/envs/cogvideo/lib/python3.12/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
/root/Ori_CogVideo/tools/convert_weight_sat2hf.py:161: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
original_state_dict = get_state_dict(torch.load(ckpt_path, map_location="cpu", mmap=True))
/root/Ori_CogVideo/tools/convert_weight_sat2hf.py:185: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
original_state_dict = get_state_dict(torch.load(ckpt_path, map_location="cpu", mmap=True))
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
/root/miniconda3/envs/cogvideo/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
[2024-09-12 15:26:11,746] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
/root/miniconda3/envs/cogvideo/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@autocast_custom_fwd
/root/miniconda3/envs/cogvideo/lib/python3.12/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
@autocast_custom_bwd
Convert Log 2 (full fine tuned SAT model to diffuser model)
(cogvideo) root@alphacode-ttv-a100-80g-gpu:~/Ori_CogVideo/tools# python convert_weight_sat2hf.py --transformer_ckpt_path /root/Ori_CogVideo/sat/ckpts_2b/lora-disney-09-12-14-52/50-ema/mp_rank_00_model_states.pt --vae_ckpt_path /root/Ori_CogVideo/CogVideoX-2b-sat/vae/3d-vae.pt --output_path /root/Ori_CogVideo/sat/ckpts_2b/lora-disney-09-12-14-52/50-ema --text_encoder_cache_dir /root/Ori_CogVideo/t5-v1_1-xxl
/root/miniconda3/envs/cogvideo/lib/python3.12/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
/root/miniconda3/envs/cogvideo/lib/python3.12/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
/root/Ori_CogVideo/tools/convert_weight_sat2hf.py:161: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
original_state_dict = get_state_dict(torch.load(ckpt_path, map_location="cpu", mmap=True))
Traceback (most recent call last):
File "/root/Ori_CogVideo/tools/convert_weight_sat2hf.py", line 246, in <module>
transformer = convert_transformer(
^^^^^^^^^^^^^^^^^^^^
File "/root/Ori_CogVideo/tools/convert_weight_sat2hf.py", line 180, in convert_transformer
transformer.load_state_dict(original_state_dict, strict=True)
File "/root/miniconda3/envs/cogvideo/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CogVideoXTransformer3DModel:
Unexpected key(s) in state_dict: "mixins.pos_embed.pos_embedding", "0.transformer_blocks.shared.weight", "0.transformer_blocks.encoder.block.0.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.0.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.0.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.0.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.0.layer.0.SelfAttention.relative_attn1_bias.weight", "0.transformer_blocks.encoder.block.0.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.0.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.0.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.0.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.0.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.1.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.1.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.1.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.1.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.1.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.1.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.1.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.1.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.1.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.2.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.2.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.2.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.2.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.2.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.2.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.2.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.2.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.2.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.3.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.3.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.3.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.3.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.3.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.3.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.3.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.3.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.3.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.4.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.4.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.4.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.4.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.4.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.4.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.4.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.4.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.4.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.5.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.5.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.5.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.5.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.5.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.5.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.5.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.5.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.5.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.6.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.6.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.6.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.6.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.6.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.6.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.6.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.6.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.6.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.7.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.7.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.7.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.7.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.7.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.7.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.7.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.7.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.7.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.8.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.8.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.8.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.8.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.8.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.8.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.8.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.8.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.8.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.9.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.9.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.9.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.9.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.9.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.9.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.9.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.9.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.9.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.10.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.10.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.10.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.10.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.10.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.10.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.10.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.10.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.10.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.11.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.11.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.11.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.11.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.11.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.11.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.11.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.11.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.11.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.12.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.12.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.12.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.12.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.12.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.12.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.12.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.12.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.12.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.13.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.13.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.13.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.13.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.13.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.13.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.13.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.13.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.13.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.14.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.14.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.14.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.14.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.14.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.14.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.14.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.14.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.14.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.15.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.15.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.15.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.15.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.15.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.15.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.15.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.15.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.15.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.16.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.16.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.16.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.16.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.16.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.16.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.16.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.16.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.16.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.17.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.17.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.17.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.17.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.17.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.17.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.17.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.17.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.17.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.18.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.18.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.18.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.18.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.18.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.18.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.18.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.18.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.18.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.19.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.19.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.19.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.19.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.19.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.19.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.19.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.19.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.19.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.20.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.20.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.20.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.20.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.20.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.20.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.20.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.20.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.20.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.21.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.21.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.21.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.21.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.21.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.21.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.21.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.21.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.21.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.22.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.22.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.22.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.22.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.22.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.22.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.22.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.22.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.22.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.block.23.layer.0.SelfAttention.q.weight", "0.transformer_blocks.encoder.block.23.layer.0.SelfAttention.k.weight", "0.transformer_blocks.encoder.block.23.layer.0.SelfAttention.v.weight", "0.transformer_blocks.encoder.block.23.layer.0.SelfAttention.o.weight", "0.transformer_blocks.encoder.block.23.layer.0.layer_norm.weight", "0.transformer_blocks.encoder.block.23.layer.1.DenseReluDense.wi_0.weight", "0.transformer_blocks.encoder.block.23.layer.1.DenseReluDense.wi_1.weight", "0.transformer_blocks.encoder.block.23.layer.1.DenseReluDense.wo.weight", "0.transformer_blocks.encoder.block.23.layer.1.layer_norm.weight", "0.transformer_blocks.encoder.final_layer_norm.weight", "der.conv_in.conv.weight", "der.conv_in.conv.bias", "der.down.0.block.0.norm1.weight", "der.down.0.block.0.norm1.bias", "der.down.0.block.0.conv1.conv.weight", "der.down.0.block.0.conv1.conv.bias", "der.down.0.block.0.norm2.weight", "der.down.0.block.0.norm2.bias", "der.down.0.block.0.conv2.conv.weight", "der.down.0.block.0.conv2.conv.bias", "der.down.0.block.1.norm1.weight", "der.down.0.block.1.norm1.bias", "der.down.0.block.1.conv1.conv.weight", "der.down.0.block.1.conv1.conv.bias", "der.down.0.block.1.norm2.weight", "der.down.0.block.1.norm2.bias", "der.down.0.block.1.conv2.conv.weight", "der.down.0.block.1.conv2.conv.bias", "der.down.0.block.2.norm1.weight", "der.down.0.block.2.norm1.bias", "der.down.0.block.2.conv1.conv.weight", "der.down.0.block.2.conv1.conv.bias", "der.down.0.block.2.norm2.weight", "der.down.0.block.2.norm2.bias", "der.down.0.block.2.conv2.conv.weight", "der.down.0.block.2.conv2.conv.bias", "der.down.0.downsample.conv.weight", "der.down.0.downsample.conv.bias", "der.down.1.block.0.norm1.weight", "der.down.1.block.0.norm1.bias", "der.down.1.block.0.conv1.conv.weight", "der.down.1.block.0.conv1.conv.bias", "der.down.1.block.0.norm2.weight", "der.down.1.block.0.norm2.bias", "der.down.1.block.0.conv2.conv.weight", "der.down.1.block.0.conv2.conv.bias", "der.down.1.block.0.nin_shortcut.weight", "der.down.1.block.0.nin_shortcut.bias", "der.down.1.block.1.norm1.weight", "der.down.1.block.1.norm1.bias", "der.down.1.block.1.conv1.conv.weight", "der.down.1.block.1.conv1.conv.bias", "der.down.1.block.1.norm2.weight", "der.down.1.block.1.norm2.bias", "der.down.1.block.1.conv2.conv.weight", "der.down.1.block.1.conv2.conv.bias", "der.down.1.block.2.norm1.weight", "der.down.1.block.2.norm1.bias", "der.down.1.block.2.conv1.conv.weight", "der.down.1.block.2.conv1.conv.bias", "der.down.1.block.2.norm2.weight", "der.down.1.block.2.norm2.bias", "der.down.1.block.2.conv2.conv.weight", "der.down.1.block.2.conv2.conv.bias", "der.down.1.downsample.conv.weight", "der.down.1.downsample.conv.bias", "der.down.2.block.0.norm1.weight", "der.down.2.block.0.norm1.bias", "der.down.2.block.0.conv1.conv.weight", "der.down.2.block.0.conv1.conv.bias", "der.down.2.block.0.norm2.weight", "der.down.2.block.0.norm2.bias", "der.down.2.block.0.conv2.conv.weight", "der.down.2.block.0.conv2.conv.bias", "der.down.2.block.1.norm1.weight", "der.down.2.block.1.norm1.bias", "der.down.2.block.1.conv1.conv.weight", "der.down.2.block.1.conv1.conv.bias", "der.down.2.block.1.norm2.weight", "der.down.2.block.1.norm2.bias", "der.down.2.block.1.conv2.conv.weight", "der.down.2.block.1.conv2.conv.bias", "der.down.2.block.2.norm1.weight", "der.down.2.block.2.norm1.bias", "der.down.2.block.2.conv1.conv.weight", "der.down.2.block.2.conv1.conv.bias", "der.down.2.block.2.norm2.weight", "der.down.2.block.2.norm2.bias", "der.down.2.block.2.conv2.conv.weight", "der.down.2.block.2.conv2.conv.bias", "der.down.2.downsample.conv.weight", "der.down.2.downsample.conv.bias", "der.down.3.block.0.norm1.weight", "der.down.3.block.0.norm1.bias", "der.down.3.block.0.conv1.conv.weight", "der.down.3.block.0.conv1.conv.bias", "der.down.3.block.0.norm2.weight", "der.down.3.block.0.norm2.bias", "der.down.3.block.0.conv2.conv.weight", "der.down.3.block.0.conv2.conv.bias", "der.down.3.block.0.nin_shortcut.weight", "der.down.3.block.0.nin_shortcut.bias", "der.down.3.block.1.norm1.weight", "der.down.3.block.1.norm1.bias", "der.down.3.block.1.conv1.conv.weight", "der.down.3.block.1.conv1.conv.bias", "der.down.3.block.1.norm2.weight", "der.down.3.block.1.norm2.bias", "der.down.3.block.1.conv2.conv.weight", "der.down.3.block.1.conv2.conv.bias", "der.down.3.block.2.norm1.weight", "der.down.3.block.2.norm1.bias", "der.down.3.block.2.conv1.conv.weight", "der.down.3.block.2.conv1.conv.bias", "der.down.3.block.2.norm2.weight", "der.down.3.block.2.norm2.bias", "der.down.3.block.2.conv2.conv.weight", "der.down.3.block.2.conv2.conv.bias", "der.mid.block_1.norm1.weight", "der.mid.block_1.norm1.bias", "der.mid.block_1.conv1.conv.weight", "der.mid.block_1.conv1.conv.bias", "der.mid.block_1.norm2.weight", "der.mid.block_1.norm2.bias", "der.mid.block_1.conv2.conv.weight", "der.mid.block_1.conv2.conv.bias", "der.mid.block_2.norm1.weight", "der.mid.block_2.norm1.bias", "der.mid.block_2.conv1.conv.weight", "der.mid.block_2.conv1.conv.bias", "der.mid.block_2.norm2.weight", "der.mid.block_2.norm2.bias", "der.mid.block_2.conv2.conv.weight", "der.mid.block_2.conv2.conv.bias", "der.norm_out.weight", "der.norm_out.bias", "der.conv_out.conv.weight", "der.conv_out.conv.bias", "der.mid.block_1.norm1.norm_layer.weight", "der.mid.block_1.norm1.norm_layer.bias", "der.mid.block_1.norm1.conv_y.conv.weight", "der.mid.block_1.norm1.conv_y.conv.bias", "der.mid.block_1.norm1.conv_b.conv.weight", "der.mid.block_1.norm1.conv_b.conv.bias", "der.mid.block_1.norm2.norm_layer.weight", "der.mid.block_1.norm2.norm_layer.bias", "der.mid.block_1.norm2.conv_y.conv.weight", "der.mid.block_1.norm2.conv_y.conv.bias", "der.mid.block_1.norm2.conv_b.conv.weight", "der.mid.block_1.norm2.conv_b.conv.bias", "der.mid.block_2.norm1.norm_layer.weight", "der.mid.block_2.norm1.norm_layer.bias", "der.mid.block_2.norm1.conv_y.conv.weight", "der.mid.block_2.norm1.conv_y.conv.bias", "der.mid.block_2.norm1.conv_b.conv.weight", "der.mid.block_2.norm1.conv_b.conv.bias", "der.mid.block_2.norm2.norm_layer.weight", "der.mid.block_2.norm2.norm_layer.bias", "der.mid.block_2.norm2.conv_y.conv.weight", "der.mid.block_2.norm2.conv_y.conv.bias", "der.mid.block_2.norm2.conv_b.conv.weight", "der.mid.block_2.norm2.conv_b.conv.bias", "der.up.0.block.0.norm1.norm_layer.weight", "der.up.0.block.0.norm1.norm_layer.bias", "der.up.0.block.0.norm1.conv_y.conv.weight", "der.up.0.block.0.norm1.conv_y.conv.bias", "der.up.0.block.0.norm1.conv_b.conv.weight", "der.up.0.block.0.norm1.conv_b.conv.bias", "der.up.0.block.0.conv1.conv.weight", "der.up.0.block.0.conv1.conv.bias", "der.up.0.block.0.norm2.norm_layer.weight", "der.up.0.block.0.norm2.norm_layer.bias", "der.up.0.block.0.norm2.conv_y.conv.weight", "der.up.0.block.0.norm2.conv_y.conv.bias", "der.up.0.block.0.norm2.conv_b.conv.weight", "der.up.0.block.0.norm2.conv_b.conv.bias", "der.up.0.block.0.conv2.conv.weight", "der.up.0.block.0.conv2.conv.bias", "der.up.0.block.0.nin_shortcut.weight", "der.up.0.block.0.nin_shortcut.bias", "der.up.0.block.1.norm1.norm_layer.weight", "der.up.0.block.1.norm1.norm_layer.bias", "der.up.0.block.1.norm1.conv_y.conv.weight", "der.up.0.block.1.norm1.conv_y.conv.bias", "der.up.0.block.1.norm1.conv_b.conv.weight", "der.up.0.block.1.norm1.conv_b.conv.bias", "der.up.0.block.1.conv1.conv.weight", "der.up.0.block.1.conv1.conv.bias", "der.up.0.block.1.norm2.norm_layer.weight", "der.up.0.block.1.norm2.norm_layer.bias", "der.up.0.block.1.norm2.conv_y.conv.weight", "der.up.0.block.1.norm2.conv_y.conv.bias", "der.up.0.block.1.norm2.conv_b.conv.weight", "der.up.0.block.1.norm2.conv_b.conv.bias", "der.up.0.block.1.conv2.conv.weight", "der.up.0.block.1.conv2.conv.bias", "der.up.0.block.2.norm1.norm_layer.weight", "der.up.0.block.2.norm1.norm_layer.bias", "der.up.0.block.2.norm1.conv_y.conv.weight", "der.up.0.block.2.norm1.conv_y.conv.bias", "der.up.0.block.2.norm1.conv_b.conv.weight", "der.up.0.block.2.norm1.conv_b.conv.bias", "der.up.0.block.2.conv1.conv.weight", "der.up.0.block.2.conv1.conv.bias", "der.up.0.block.2.norm2.norm_layer.weight", "der.up.0.block.2.norm2.norm_layer.bias", "der.up.0.block.2.norm2.conv_y.conv.weight", "der.up.0.block.2.norm2.conv_y.conv.bias", "der.up.0.block.2.norm2.conv_b.conv.weight", "der.up.0.block.2.norm2.conv_b.conv.bias", "der.up.0.block.2.conv2.conv.weight", "der.up.0.block.2.conv2.conv.bias", "der.up.0.block.3.norm1.norm_layer.weight", "der.up.0.block.3.norm1.norm_layer.bias", "der.up.0.block.3.norm1.conv_y.conv.weight", "der.up.0.block.3.norm1.conv_y.conv.bias", "der.up.0.block.3.norm1.conv_b.conv.weight", "der.up.0.block.3.norm1.conv_b.conv.bias", "der.up.0.block.3.conv1.conv.weight", "der.up.0.block.3.conv1.conv.bias", "der.up.0.block.3.norm2.norm_layer.weight", "der.up.0.block.3.norm2.norm_layer.bias", "der.up.0.block.3.norm2.conv_y.conv.weight", "der.up.0.block.3.norm2.conv_y.conv.bias", "der.up.0.block.3.norm2.conv_b.conv.weight", "der.up.0.block.3.norm2.conv_b.conv.bias", "der.up.0.block.3.conv2.conv.weight", "der.up.0.block.3.conv2.conv.bias", "der.up.1.block.0.norm1.norm_layer.weight", "der.up.1.block.0.norm1.norm_layer.bias", "der.up.1.block.0.norm1.conv_y.conv.weight", "der.up.1.block.0.norm1.conv_y.conv.bias", "der.up.1.block.0.norm1.conv_b.conv.weight", "der.up.1.block.0.norm1.conv_b.conv.bias", "der.up.1.block.0.conv1.conv.weight", "der.up.1.block.0.conv1.conv.bias", "der.up.1.block.0.norm2.norm_layer.weight", "der.up.1.block.0.norm2.norm_layer.bias", "der.up.1.block.0.norm2.conv_y.conv.weight", "der.up.1.block.0.norm2.conv_y.conv.bias", "der.up.1.block.0.norm2.conv_b.conv.weight", "der.up.1.block.0.norm2.conv_b.conv.bias", "der.up.1.block.0.conv2.conv.weight", "der.up.1.block.0.conv2.conv.bias", "der.up.1.block.1.norm1.norm_layer.weight", "der.up.1.block.1.norm1.norm_layer.bias", "der.up.1.block.1.norm1.conv_y.conv.weight", "der.up.1.block.1.norm1.conv_y.conv.bias", "der.up.1.block.1.norm1.conv_b.conv.weight", "der.up.1.block.1.norm1.conv_b.conv.bias", "der.up.1.block.1.conv1.conv.weight", "der.up.1.block.1.conv1.conv.bias", "der.up.1.block.1.norm2.norm_layer.weight", "der.up.1.block.1.norm2.norm_layer.bias", "der.up.1.block.1.norm2.conv_y.conv.weight", "der.up.1.block.1.norm2.conv_y.conv.bias", "der.up.1.block.1.norm2.conv_b.conv.weight", "der.up.1.block.1.norm2.conv_b.conv.bias", "der.up.1.block.1.conv2.conv.weight", "der.up.1.block.1.conv2.conv.bias", "der.up.1.block.2.norm1.norm_layer.weight", "der.up.1.block.2.norm1.norm_layer.bias", "der.up.1.block.2.norm1.conv_y.conv.weight", "der.up.1.block.2.norm1.conv_y.conv.bias", "der.up.1.block.2.norm1.conv_b.conv.weight", "der.up.1.block.2.norm1.conv_b.conv.bias", "der.up.1.block.2.conv1.conv.weight", "der.up.1.block.2.conv1.conv.bias", "der.up.1.block.2.norm2.norm_layer.weight", "der.up.1.block.2.norm2.norm_layer.bias", "der.up.1.block.2.norm2.conv_y.conv.weight", "der.up.1.block.2.norm2.conv_y.conv.bias", "der.up.1.block.2.norm2.conv_b.conv.weight", "der.up.1.block.2.norm2.conv_b.conv.bias", "der.up.1.block.2.conv2.conv.weight", "der.up.1.block.2.conv2.conv.bias", "der.up.1.block.3.norm1.norm_layer.weight", "der.up.1.block.3.norm1.norm_layer.bias", "der.up.1.block.3.norm1.conv_y.conv.weight", "der.up.1.block.3.norm1.conv_y.conv.bias", "der.up.1.block.3.norm1.conv_b.conv.weight", "der.up.1.block.3.norm1.conv_b.conv.bias", "der.up.1.block.3.conv1.conv.weight", "der.up.1.block.3.conv1.conv.bias", "der.up.1.block.3.norm2.norm_layer.weight", "der.up.1.block.3.norm2.norm_layer.bias", "der.up.1.block.3.norm2.conv_y.conv.weight", "der.up.1.block.3.norm2.conv_y.conv.bias", "der.up.1.block.3.norm2.conv_b.conv.weight", "der.up.1.block.3.norm2.conv_b.conv.bias", "der.up.1.block.3.conv2.conv.weight", "der.up.1.block.3.conv2.conv.bias", "der.up.1.upsample.conv.weight", "der.up.1.upsample.conv.bias", "der.up.2.block.0.norm1.norm_layer.weight", "der.up.2.block.0.norm1.norm_layer.bias", "der.up.2.block.0.norm1.conv_y.conv.weight", "der.up.2.block.0.norm1.conv_y.conv.bias", "der.up.2.block.0.norm1.conv_b.conv.weight", "der.up.2.block.0.norm1.conv_b.conv.bias", "der.up.2.block.0.conv1.conv.weight", "der.up.2.block.0.conv1.conv.bias", "der.up.2.block.0.norm2.norm_layer.weight", "der.up.2.block.0.norm2.norm_layer.bias", "der.up.2.block.0.norm2.conv_y.conv.weight", "der.up.2.block.0.norm2.conv_y.conv.bias", "der.up.2.block.0.norm2.conv_b.conv.weight", "der.up.2.block.0.norm2.conv_b.conv.bias", "der.up.2.block.0.conv2.conv.weight", "der.up.2.block.0.conv2.conv.bias", "der.up.2.block.0.nin_shortcut.weight", "der.up.2.block.0.nin_shortcut.bias", "der.up.2.block.1.norm1.norm_layer.weight", "der.up.2.block.1.norm1.norm_layer.bias", "der.up.2.block.1.norm1.conv_y.conv.weight", "der.up.2.block.1.norm1.conv_y.conv.bias", "der.up.2.block.1.norm1.conv_b.conv.weight", "der.up.2.block.1.norm1.conv_b.conv.bias", "der.up.2.block.1.conv1.conv.weight", "der.up.2.block.1.conv1.conv.bias", "der.up.2.block.1.norm2.norm_layer.weight", "der.up.2.block.1.norm2.norm_layer.bias", "der.up.2.block.1.norm2.conv_y.conv.weight", "der.up.2.block.1.norm2.conv_y.conv.bias", "der.up.2.block.1.norm2.conv_b.conv.weight", "der.up.2.block.1.norm2.conv_b.conv.bias", "der.up.2.block.1.conv2.conv.weight", "der.up.2.block.1.conv2.conv.bias", "der.up.2.block.2.norm1.norm_layer.weight", "der.up.2.block.2.norm1.norm_layer.bias", "der.up.2.block.2.norm1.conv_y.conv.weight", "der.up.2.block.2.norm1.conv_y.conv.bias", "der.up.2.block.2.norm1.conv_b.conv.weight", "der.up.2.block.2.norm1.conv_b.conv.bias", "der.up.2.block.2.conv1.conv.weight", "der.up.2.block.2.conv1.conv.bias", "der.up.2.block.2.norm2.norm_layer.weight", "der.up.2.block.2.norm2.norm_layer.bias", "der.up.2.block.2.norm2.conv_y.conv.weight", "der.up.2.block.2.norm2.conv_y.conv.bias", "der.up.2.block.2.norm2.conv_b.conv.weight", "der.up.2.block.2.norm2.conv_b.conv.bias", "der.up.2.block.2.conv2.conv.weight", "der.up.2.block.2.conv2.conv.bias", "der.up.2.block.3.norm1.norm_layer.weight", "der.up.2.block.3.norm1.norm_layer.bias", "der.up.2.block.3.norm1.conv_y.conv.weight", "der.up.2.block.3.norm1.conv_y.conv.bias", "der.up.2.block.3.norm1.conv_b.conv.weight", "der.up.2.block.3.norm1.conv_b.conv.bias", "der.up.2.block.3.conv1.conv.weight", "der.up.2.block.3.conv1.conv.bias", "der.up.2.block.3.norm2.norm_layer.weight", "der.up.2.block.3.norm2.norm_layer.bias", "der.up.2.block.3.norm2.conv_y.conv.weight", "der.up.2.block.3.norm2.conv_y.conv.bias", "der.up.2.block.3.norm2.conv_b.conv.weight", "der.up.2.block.3.norm2.conv_b.conv.bias", "der.up.2.block.3.conv2.conv.weight", "der.up.2.block.3.conv2.conv.bias", "der.up.2.upsample.conv.weight", "der.up.2.upsample.conv.bias", "der.up.3.block.0.norm1.norm_layer.weight", "der.up.3.block.0.norm1.norm_layer.bias", "der.up.3.block.0.norm1.conv_y.conv.weight", "der.up.3.block.0.norm1.conv_y.conv.bias", "der.up.3.block.0.norm1.conv_b.conv.weight", "der.up.3.block.0.norm1.conv_b.conv.bias", "der.up.3.block.0.conv1.conv.weight", "der.up.3.block.0.conv1.conv.bias", "der.up.3.block.0.norm2.norm_layer.weight", "der.up.3.block.0.norm2.norm_layer.bias", "der.up.3.block.0.norm2.conv_y.conv.weight", "der.up.3.block.0.norm2.conv_y.conv.bias", "der.up.3.block.0.norm2.conv_b.conv.weight", "der.up.3.block.0.norm2.conv_b.conv.bias", "der.up.3.block.0.conv2.conv.weight", "der.up.3.block.0.conv2.conv.bias", "der.up.3.block.1.norm1.norm_layer.weight", "der.up.3.block.1.norm1.norm_layer.bias", "der.up.3.block.1.norm1.conv_y.conv.weight", "der.up.3.block.1.norm1.conv_y.conv.bias", "der.up.3.block.1.norm1.conv_b.conv.weight", "der.up.3.block.1.norm1.conv_b.conv.bias", "der.up.3.block.1.conv1.conv.weight", "der.up.3.block.1.conv1.conv.bias", "der.up.3.block.1.norm2.norm_layer.weight", "der.up.3.block.1.norm2.norm_layer.bias", "der.up.3.block.1.norm2.conv_y.conv.weight", "der.up.3.block.1.norm2.conv_y.conv.bias", "der.up.3.block.1.norm2.conv_b.conv.weight", "der.up.3.block.1.norm2.conv_b.conv.bias", "der.up.3.block.1.conv2.conv.weight", "der.up.3.block.1.conv2.conv.bias", "der.up.3.block.2.norm1.norm_layer.weight", "der.up.3.block.2.norm1.norm_layer.bias", "der.up.3.block.2.norm1.conv_y.conv.weight", "der.up.3.block.2.norm1.conv_y.conv.bias", "der.up.3.block.2.norm1.conv_b.conv.weight", "der.up.3.block.2.norm1.conv_b.conv.bias", "der.up.3.block.2.conv1.conv.weight", "der.up.3.block.2.conv1.conv.bias", "der.up.3.block.2.norm2.norm_layer.weight", "der.up.3.block.2.norm2.norm_layer.bias", "der.up.3.block.2.norm2.conv_y.conv.weight", "der.up.3.block.2.norm2.conv_y.conv.bias", "der.up.3.block.2.norm2.conv_b.conv.weight", "der.up.3.block.2.norm2.conv_b.conv.bias", "der.up.3.block.2.conv2.conv.weight", "der.up.3.block.2.conv2.conv.bias", "der.up.3.block.3.norm1.norm_layer.weight", "der.up.3.block.3.norm1.norm_layer.bias", "der.up.3.block.3.norm1.conv_y.conv.weight", "der.up.3.block.3.norm1.conv_y.conv.bias", "der.up.3.block.3.norm1.conv_b.conv.weight", "der.up.3.block.3.norm1.conv_b.conv.bias", "der.up.3.block.3.conv1.conv.weight", "der.up.3.block.3.conv1.conv.bias", "der.up.3.block.3.norm2.norm_layer.weight", "der.up.3.block.3.norm2.norm_layer.bias", "der.up.3.block.3.norm2.conv_y.conv.weight", "der.up.3.block.3.norm2.conv_y.conv.bias", "der.up.3.block.3.norm2.conv_b.conv.weight", "der.up.3.block.3.norm2.conv_b.conv.bias", "der.up.3.block.3.conv2.conv.weight", "der.up.3.block.3.conv2.conv.bias", "der.up.3.upsample.conv.weight", "der.up.3.upsample.conv.bias", "der.norm_out.norm_layer.weight", "der.norm_out.norm_layer.bias", "der.norm_out.conv_y.conv.weight", "der.norm_out.conv_y.conv.bias", "der.norm_out.conv_b.conv.weight", "der.norm_out.conv_b.conv.bias".
how to fix this error?? I checked the dict keys of the two models, and they are different.
I don't understand why the model dict keys change when doing fine-tuning.
https://github.com/THUDM/CogVideo/pull/268 Exporting Huggingface Diffusers lora LoRA Weights from SAT Checkpoints
diffuser's draft pr can try this operation https://github.com/huggingface/diffusers/pull/9412
@glide-the Let me give it a try. Thanks for the advice!
But my issue occurred not only during LoRA fine-tuning, but also during full parameter fine-tuning.
If you are doing full parameter fine-tuning, you can directly follow the default conversion script, here https://github.com/THUDM/CogVideo/blob/main/tools/convert_weight_sat2hf.py
@KihongK I don't see any fine-tuning of sft work logger error .
@glide-the Let me give it a try. Thanks for the advice!
But my issue occurred not only during LoRA fine-tuning, but also during full parameter fine-tuning.
Same issue when doing converting for full parameter fine-tuning, have you solved it?