使用蒸馏模型lora训练模型结构不匹配

Open schhaohao opened this issue 6 months ago • 0 comments

[2025-06-17 02:20:17] Resume from checkpoint /home/24031212058/modelhub/hunyuan-v1-2-distilled/pytorch_model_distill.pt [rank0]: Traceback (most recent call last): [rank0]: File "/home/24031212058/HunyuanDiT/hydit/train_deepspeed.py", line 691, in [rank0]: main(get_args()) [rank0]: File "/home/24031212058/HunyuanDiT/hydit/train_deepspeed.py", line 448, in main [rank0]: model, ema, start_epoch, start_epoch_step, train_steps = model_resume( [rank0]: ^^^^^^^^^^^^^ [rank0]: File "/home/24031212058/HunyuanDiT/hydit/utils/tools.py", line 184, in model_resume [rank0]: model.load_state_dict(resume_ckpt, strict=args.strict) [rank0]: File "/home/24031212058/HunyuanDiT/hydit/modules/fp16_layers.py", line 90, in load_state_dict [rank0]: self.module.load_state_dict(state_dict, strict=strict) [rank0]: File "/home/24031212058/env/hunyuan/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict [rank0]: raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( [rank0]: RuntimeError: Error(s) in loading state_dict for HunYuanDiT: [rank0]: size mismatch for extra_embedder.0.weight: copying a param with shape torch.Size([5632, 3968]) from checkpoint, the shape in current model is torch.Size([5632, 1024]).

加入了--no-strict还是会报错

Jun 17 '25 02:06 schhaohao