motion_puzzle errors occur when loading the pretrained model

Hi @DK-Jang ! Thank you for sharing this nice job :-) I meet some troubles when loading the pre-trained model and pytorch maps the location to a device as the code described below:

self.device = torch.cuda.current_device()

https://github.com/DK-Jang/motion_puzzle/blob/7f1eca9acaf17b84d0d75ca509dbce8c5f9d8472/trainer.py#L33

For my case it returns 0 and trigger an error:

TypeError: 'int' object is not callable. '

When I modify this line to self.device = torch.device('cuda:0') the error message changes to

RuntimeError: Error(s) in loading state_dict for DataParallel:
	Missing key(s) in state_dict: "module.enc_content.edge_importance_j", ...
	Unexpected key(s) in state_dict: "enc_content.edge_importance_j", ...

I think this is because the model is trained and saved in a parallel approach, however it is impossible for me to run on multiple GPUs.

Please offer me a help, thanks ahead!

Sep 20 '22 03:09 zzzark

I faced the same problem. I think that pretrained_network trained with no data parallel (or use cpu). This worked by modifying the code as follows:

In trainer.py https://github.com/DK-Jang/motion_puzzle/blob/7f1eca9acaf17b84d0d75ca509dbce8c5f9d8472/trainer.py#L15

from collections import OrderedDict

https://github.com/DK-Jang/motion_puzzle/blob/7f1eca9acaf17b84d0d75ca509dbce8c5f9d8472/trainer.py#L33

self.device = torch.device("cuda:{}".format(torch.cuda.current_device()))

https://github.com/DK-Jang/motion_puzzle/blob/7f1eca9acaf17b84d0d75ca509dbce8c5f9d8472/trainer.py#L163-L164

        gen_dict = OrderedDict()
        for key, value in state_dict["gen"].items():
            if not key.startswith("module."):
                key = "module." + key
            gen_dict[key] = value
        self.gen.load_state_dict(gen_dict)
        gen_ema_dict = OrderedDict()
        for key, value in state_dict["gen_ema"].items():
            if not key.startswith("module."):
                key = "module." + key
            gen_ema_dict[key] = value
        self.gen_ema.load_state_dict(gen_ema_dict)

In test.py https://github.com/DK-Jang/motion_puzzle/blob/7f1eca9acaf17b84d0d75ca509dbce8c5f9d8472/test.py#L128-L131

        rec = rec.cpu().numpy()*std + mean
        tra = tra.cpu().numpy()*std + mean
        con_gt = con_gt.cpu().numpy()*std + mean
        sty_gt = sty_gt.cpu().numpy()*std + mean

If you want to retrain this work, these changes must be erased.

Sep 23 '22 05:09 KosukeFukazawa

I have the same problem. The way I tried:

Change this part to: https://github.com/DK-Jang/motion_puzzle/blob/52af967040c42c5eb37c48b03d412e81e7b37def/trainer.py#L34-L35

self.gen = self.gen.to(self.device)
self.gen_ema = self.gen_ema.to(self.device)

And: https://github.com/DK-Jang/motion_puzzle/blob/52af967040c42c5eb37c48b03d412e81e7b37def/trainer.py#L162 state_dict = torch.load(model_path, map_location="cuda:0")

And do the same thing in test.py in KosukeFukazawa's thread.

Nov 28 '22 20:11 PerfectBlueFeynman

motion_puzzle motion_puzzle copied to clipboard

errors occur when loading the pretrained model

motion_puzzle
motion_puzzle copied to clipboard