GeneFacePlusPlus icon indicating copy to clipboard operation
GeneFacePlusPlus copied to clipboard

训练时 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format 求指导

Open yueool opened this issue 1 year ago • 3 comments

训练 python tasks/run.py --config=egs/datasets/x6/lm3d_radnerf_sr.yaml --exp_name=motion2video_nerf/may_head --reset


Traceback (most recent call last): File "D:\GeneFacePlusPlus_py39\utils\commons\trainer.py", line 151, in fit self.run_single_process(self.task) File "D:\GeneFacePlusPlus_py39\utils\commons\trainer.py", line 209, in run_single_process self.restore_weights(checkpoint) File "D:\GeneFacePlusPlus_py39\utils\commons\trainer.py", line 476, in restore_weights getattr(task_ref, k).load_state_dict(v, strict=True) File "D:\GeneFacePlusPlus_py39\python\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for RADNeRFwithSR: size mismatch for blink_encoder.1.weight: copying a param with shape torch.Size([8, 32]) from checkpoint, the shape in current model is torch.Size([2, 32]). size mismatch for blink_encoder.1.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([2]). 'pkill' 不是内部或外部命令,也不是可运行的程序 或批处理文件。 Traceback (most recent call last): File "D:\GeneFacePlusPlus_py39\utils\commons\trainer.py", line 151, in fit self.run_single_process(self.task) File "D:\GeneFacePlusPlus_py39\utils\commons\trainer.py", line 209, in run_single_process self.restore_weights(checkpoint) File "D:\GeneFacePlusPlus_py39\utils\commons\trainer.py", line 476, in restore_weights getattr(task_ref, k).load_state_dict(v, strict=True) File "D:\GeneFacePlusPlus_py39\python\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for RADNeRFwithSR: size mismatch for blink_encoder.1.weight: copying a param with shape torch.Size([8, 32]) from checkpoint, the shape in current model is torch.Size([2, 32]). size mismatch for blink_encoder.1.bias: copying a param with shape torch.Size([8]) from checkpoint, the shape in current model is torch.Size([2]).

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\GeneFacePlusPlus_py39\tasks\run.py", line 28, in run_task() File "D:\GeneFacePlusPlus_py39\tasks\run.py", line 16, in run_task task_cls.start() File "D:\GeneFacePlusPlus_py39\utils\commons\base_task.py", line 272, in start trainer.fit(cls) File "D:\GeneFacePlusPlus_py39\utils\commons\trainer.py", line 156, in fit subprocess.check_call(f'pkill -f "GeneFace_worker ({hparams["work_dir"]}"', shell=True) File "D:\GeneFacePlusPlus_py39\python\lib\subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'pkill -f "GeneFace_worker (checkpoints/motion2video_nerf/may_head"' returned non-zero exit status 1.

我的环境: WIN10 python39 torch2.0.1

卡在这里过不去了,求指导

yueool avatar Apr 18 '24 17:04 yueool

我也是这个问题,我采用不严格匹配的方式,具体的方法可以参考我的git

abinggo avatar Apr 23 '24 05:04 abinggo

应该是eye_blink_dim设置问题,找到egs\datasets\May\lm3d_radnerf_sr.yaml修改如下

eye_blink_dim: 8 https://github.com/yerfor/GeneFacePlusPlus/pull/2 把2改为8,就ok; 其实不用它原来的checkpoints是不存在这个问题的

raymondren1982 avatar Jun 17 '24 03:06 raymondren1982

十分感谢,谢谢你,我这就去试试,你真棒

yueool avatar Jun 19 '24 17:06 yueool