Deep_Object_Pose icon indicating copy to clipboard operation
Deep_Object_Pose copied to clipboard

getting an error while running inference.py

Open athern27 opened this issue 11 months ago • 4 comments

I am getting this error while running inference.py. I have tried reinstallt torch and torchvision for different versions but nothing seems to work

(dope_training) add@add-MS-7C84:~/kick_blenderproc/Deep_Object_Pose/train$ python ../inference/inference.py --weights output/weights/net_epoch_60.pth --data palletjack_data_test/ --object palletjack /home/add/kick_blenderproc/Deep_Object_Pose/train/output/dope_training/lib/python3.8/site-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 2.0.2 (you have 1.4.18). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1. check_for_updates() Found 1 weights. Loading DOPE model 'output/weights/net_epoch_60.pth'... /home/add/kick_blenderproc/Deep_Object_Pose/train/output/dope_training/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /home/add/kick_blenderproc/Deep_Object_Pose/train/output/dope_training/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or Nonefor 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=None. warnings.warn(msg) /home/add/kick_blenderproc/Deep_Object_Pose/train/../common/detector.py:274: FutureWarning: You are using torch.loadwithweights_only=False(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_onlywill be flipped toTrue. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=Truefor any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. net.load_state_dict(torch.load(path, map_location=device)) Traceback (most recent call last): File "../inference/inference.py", line 258, in <module> dope_node = DopeNode(config, weight, opt.parallel, opt.object) File "../inference/inference.py", line 49, in __init__ self.model.load_net_model() File "/home/addb/kick_blenderproc/Deep_Object_Pose/train/../common/detector.py", line 253, in load_net_model self.net = self.load_net_model_path(self.net_path) File "/home/add/kick_blenderproc/Deep_Object_Pose/train/../common/detector.py", line 274, in load_net_model_path net.load_state_dict(torch.load(path, map_location=device)) File "/home/add/kick_blenderproc/Deep_Object_Pose/train/output/dope_training/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DopeNetwork: Missing key(s) in state_dict: "vgg.0.weight", "vgg.0.bias", "vgg.2.weight", "vgg.2.bias", "vgg.5.weight", "vgg.5.bias", "vgg.7.weight", "vgg.7.bias", "vgg.10.weight", "vgg.10.bias", "vgg.12.weight", "vgg.12.bias", "vgg.14.weight", "vgg.14.bias", "vgg.16.weight", "vgg.16.bias", "vgg.19.weight", "vgg.19.bias", "vgg.21.weight", "vgg.21.bias", "vgg.23.weight", "vgg.23.bias", "vgg.25.weight", "vgg.25.bias", "m1_2.0.weight", "m1_2.0.bias", "m1_2.2.weight", "m1_2.2.bias", "m1_2.4.weight", "m1_2.4.bias", "m1_2.6.weight", "m1_2.6.bias", "m1_2.8.weight", "m1_2.8.bias", "m2_2.0.weight", "m2_2.0.bias", "m2_2.2.weight", "m2_2.2.bias", "m2_2.4.weight", "m2_2.4.bias", "m2_2.6.weight", "m2_2.6.bias", "m2_2.8.weight", "m2_2.8.bias", "m2_2.10.weight", "m2_2.10.bias", "m2_2.12.weight", "m2_2.12.bias", "m3_2.0.weight", "m3_2.0.bias", "m3_2.2.weight", "m3_2.2.bias", "m3_2.4.weight", "m3_2.4.bias", "m3_2.6.weight", "m3_2.6.bias", "m3_2.8.weight", "m3_2.8.bias", "m3_2.10.weight", "m3_2.10.bias", "m3_2.12.weight", "m3_2.12.bias", "m4_2.0.weight", "m4_2.0.bias", "m4_2.2.weight", "m4_2.2.bias", "m4_2.4.weight", "m4_2.4.bias", "m4_2.6.weight", "m4_2.6.bias", "m4_2.8.weight", "m4_2.8.bias", "m4_2.10.weight", "m4_2.10.bias", "m4_2.12.weight", "m4_2.12.bias", "m5_2.0.weight", "m5_2.0.bias", "m5_2.2.weight", "m5_2.2.bias", "m5_2.4.weight", "m5_2.4.bias", "m5_2.6.weight", "m5_2.6.bias", "m5_2.8.weight", "m5_2.8.bias", "m5_2.10.weight", "m5_2.10.bias", "m5_2.12.weight", "m5_2.12.bias", "m6_2.0.weight", "m6_2.0.bias", "m6_2.2.weight", "m6_2.2.bias", "m6_2.4.weight", "m6_2.4.bias", "m6_2.6.weight", "m6_2.6.bias", "m6_2.8.weight", "m6_2.8.bias", "m6_2.10.weight", "m6_2.10.bias", "m6_2.12.weight", "m6_2.12.bias", "m1_1.0.weight", "m1_1.0.bias", "m1_1.2.weight", "m1_1.2.bias", "m1_1.4.weight", "m1_1.4.bias", "m1_1.6.weight", "m1_1.6.bias", "m1_1.8.weight", "m1_1.8.bias", "m2_1.0.weight", "m2_1.0.bias", "m2_1.2.weight", "m2_1.2.bias", "m2_1.4.weight", "m2_1.4.bias", "m2_1.6.weight", "m2_1.6.bias", "m2_1.8.weight", "m2_1.8.bias", "m2_1.10.weight", "m2_1.10.bias", "m2_1.12.weight", "m2_1.12.bias", "m3_1.0.weight", "m3_1.0.bias", "m3_1.2.weight", "m3_1.2.bias", "m3_1.4.weight", "m3_1.4.bias", "m3_1.6.weight", "m3_1.6.bias", "m3_1.8.weight", "m3_1.8.bias", "m3_1.10.weight", "m3_1.10.bias", "m3_1.12.weight", "m3_1.12.bias", "m4_1.0.weight", "m4_1.0.bias", "m4_1.2.weight", "m4_1.2.bias", "m4_1.4.weight", "m4_1.4.bias", "m4_1.6.weight", "m4_1.6.bias", "m4_1.8.weight", "m4_1.8.bias", "m4_1.10.weight", "m4_1.10.bias", "m4_1.12.weight", "m4_1.12.bias", "m5_1.0.weight", "m5_1.0.bias", "m5_1.2.weight", "m5_1.2.bias", "m5_1.4.weight", "m5_1.4.bias", "m5_1.6.weight", "m5_1.6.bias", "m5_1.8.weight", "m5_1.8.bias", "m5_1.10.weight", "m5_1.10.bias", "m5_1.12.weight", "m5_1.12.bias", "m6_1.0.weight", "m6_1.0.bias", "m6_1.2.weight", "m6_1.2.bias", "m6_1.4.weight", "m6_1.4.bias", "m6_1.6.weight", "m6_1.6.bias", "m6_1.8.weight", "m6_1.8.bias", "m6_1.10.weight", "m6_1.10.bias", "m6_1.12.weight", "m6_1.12.bias". Unexpected key(s) in state_dict: "module.vgg.0.weight", "module.vgg.0.bias", "module.vgg.2.weight", "module.vgg.2.bias", "module.vgg.5.weight", "module.vgg.5.bias", "module.vgg.7.weight", "module.vgg.7.bias", "module.vgg.10.weight", "module.vgg.10.bias", "module.vgg.12.weight", "module.vgg.12.bias", "module.vgg.14.weight", "module.vgg.14.bias", "module.vgg.16.weight", "module.vgg.16.bias", "module.vgg.19.weight", "module.vgg.19.bias", "module.vgg.21.weight", "module.vgg.21.bias", "module.vgg.23.weight", "module.vgg.23.bias", "module.vgg.25.weight", "module.vgg.25.bias", "module.m1_2.0.weight", "module.m1_2.0.bias", "module.m1_2.2.weight", "module.m1_2.2.bias", "module.m1_2.4.weight", "module.m1_2.4.bias", "module.m1_2.6.weight", "module.m1_2.6.bias", "module.m1_2.8.weight", "module.m1_2.8.bias", "module.m2_2.0.weight", "module.m2_2.0.bias", "module.m2_2.2.weight", "module.m2_2.2.bias", "module.m2_2.4.weight", "module.m2_2.4.bias", "module.m2_2.6.weight", "module.m2_2.6.bias", "module.m2_2.8.weight", "module.m2_2.8.bias", "module.m2_2.10.weight", "module.m2_2.10.bias", "module.m2_2.12.weight", "module.m2_2.12.bias", "module.m3_2.0.weight", "module.m3_2.0.bias", "module.m3_2.2.weight", "module.m3_2.2.bias", "module.m3_2.4.weight", "module.m3_2.4.bias", "module.m3_2.6.weight", "module.m3_2.6.bias", "module.m3_2.8.weight", "module.m3_2.8.bias", "module.m3_2.10.weight", "module.m3_2.10.bias", "module.m3_2.12.weight", "module.m3_2.12.bias", "module.m4_2.0.weight", "module.m4_2.0.bias", "module.m4_2.2.weight", "module.m4_2.2.bias", "module.m4_2.4.weight", "module.m4_2.4.bias", "module.m4_2.6.weight", "module.m4_2.6.bias", "module.m4_2.8.weight", "module.m4_2.8.bias", "module.m4_2.10.weight", "module.m4_2.10.bias", "module.m4_2.12.weight", "module.m4_2.12.bias", "module.m5_2.0.weight", "module.m5_2.0.bias", "module.m5_2.2.weight", "module.m5_2.2.bias", "module.m5_2.4.weight", "module.m5_2.4.bias", "module.m5_2.6.weight", "module.m5_2.6.bias", "module.m5_2.8.weight", "module.m5_2.8.bias", "module.m5_2.10.weight", "module.m5_2.10.bias", "module.m5_2.12.weight", "module.m5_2.12.bias", "module.m6_2.0.weight", "module.m6_2.0.bias", "module.m6_2.2.weight", "module.m6_2.2.bias", "module.m6_2.4.weight", "module.m6_2.4.bias", "module.m6_2.6.weight", "module.m6_2.6.bias", "module.m6_2.8.weight", "module.m6_2.8.bias", "module.m6_2.10.weight", "module.m6_2.10.bias", "module.m6_2.12.weight", "module.m6_2.12.bias", "module.m1_1.0.weight", "module.m1_1.0.bias", "module.m1_1.2.weight", "module.m1_1.2.bias", "module.m1_1.4.weight", "module.m1_1.4.bias", "module.m1_1.6.weight", "module.m1_1.6.bias", "module.m1_1.8.weight", "module.m1_1.8.bias", "module.m2_1.0.weight", "module.m2_1.0.bias", "module.m2_1.2.weight", "module.m2_1.2.bias", "module.m2_1.4.weight", "module.m2_1.4.bias", "module.m2_1.6.weight", "module.m2_1.6.bias", "module.m2_1.8.weight", "module.m2_1.8.bias", "module.m2_1.10.weight", "module.m2_1.10.bias", "module.m2_1.12.weight", "module.m2_1.12.bias", "module.m3_1.0.weight", "module.m3_1.0.bias", "module.m3_1.2.weight", "module.m3_1.2.bias", "module.m3_1.4.weight", "module.m3_1.4.bias", "module.m3_1.6.weight", "module.m3_1.6.bias", "module.m3_1.8.weight", "module.m3_1.8.bias", "module.m3_1.10.weight", "module.m3_1.10.bias", "module.m3_1.12.weight", "module.m3_1.12.bias", "module.m4_1.0.weight", "module.m4_1.0.bias", "module.m4_1.2.weight", "module.m4_1.2.bias", "module.m4_1.4.weight", "module.m4_1.4.bias", "module.m4_1.6.weight", "module.m4_1.6.bias", "module.m4_1.8.weight", "module.m4_1.8.bias", "module.m4_1.10.weight", "module.m4_1.10.bias", "module.m4_1.12.weight", "module.m4_1.12.bias", "module.m5_1.0.weight", "module.m5_1.0.bias", "module.m5_1.2.weight", "module.m5_1.2.bias", "module.m5_1.4.weight", "module.m5_1.4.bias", "module.m5_1.6.weight", "module.m5_1.6.bias", "module.m5_1.8.weight", "module.m5_1.8.bias", "module.m5_1.10.weight", "module.m5_1.10.bias", "module.m5_1.12.weight", "module.m5_1.12.bias", "module.m6_1.0.weight", "module.m6_1.0.bias", "module.m6_1.2.weight", "module.m6_1.2.bias", "module.m6_1.4.weight", "module.m6_1.4.bias", "module.m6_1.6.weight", "module.m6_1.6.bias", "module.m6_1.8.weight", "module.m6_1.8.bias", "module.m6_1.10.weight", "module.m6_1.10.bias", "module.m6_1.12.weight", "module.m6_1.12.bias".

I also downgraded torchvision to 0.12.1 as per this but it said that my RTX 3090 doesnt support this version. Kindly help me in this matter

athern27 avatar Feb 03 '25 10:02 athern27

I have the same question, do you solve this problem?

liuwenchao12480 avatar Mar 03 '25 08:03 liuwenchao12480

I think you need to downgrade torch, I think someone else had a similar problem, check the other issues.

TontonTremblay avatar Mar 03 '25 18:03 TontonTremblay

@liuwenchao12480 No I was not able to solve this problem. I tried downgrading it but my GPU didn't support that version so I tried different versions of torch and torchvision but it still didn't work so I gave up.If you are able to resolve it please do tell me

athern27 avatar Mar 05 '25 10:03 athern27

Check with chat gpt on how to rename the modules so they would match!

TontonTremblay avatar Mar 05 '25 17:03 TontonTremblay

The reason for this issue is that you used multiple GPUs for training and added a model prefix before the model. You only need to disable multi GPU training during the training process @liuwenchao12480

czy1998916 avatar Apr 02 '25 02:04 czy1998916

I run into the same problem while trying to execute example cmmand python inference.py --weights ../weights --data ../sample_data --object cracker

Trinitro-Dopamine avatar Apr 10 '25 07:04 Trinitro-Dopamine

The code base is not maintained anymore, happy to accept a PR for this problem though. https://chatgpt.com/share/67f7df12-2c10-8013-8af1-db769cfc7a85 please check this to fix your problem.

TontonTremblay avatar Apr 10 '25 15:04 TontonTremblay

To complete @TontonTremblay’s answer, the issue is quite simple: the error is caused by the dictionary you’re saving not having the same key names the model loader expects. This is probably due to a difference in PyTorch.

I created this script with ChatGPT. It converts all the keys in the weight files (.pth) to the names the model loader expects.


#!/usr/bin/env python3
import os
import torch
from collections import OrderedDict

WEIGHTS_DIR = "./output/weights"

for filename in os.listdir(WEIGHTS_DIR):
    if filename.endswith(".pth") and filename.startswith("net_epoch_"):
        file_path = os.path.join(WEIGHTS_DIR, filename)
        print(f"Processing {file_path} ...")

        # Load original state dict
        state_dict = torch.load(file_path, map_location="cpu")

        # Only convert if 'module.' prefix exists
        keys = list(state_dict.keys())
        if keys[0].startswith("module."):
            new_state_dict = OrderedDict()
            for k, v in state_dict.items():
                name = k[7:] if k.startswith("module.") else k
                new_state_dict[name] = v

            # Save new file with _cleaned suffix
            cleaned_path = os.path.join(WEIGHTS_DIR, filename.replace(".pth", "_cleaned.pth"))
            torch.save(new_state_dict, cleaned_path)
            print(f"Saved cleaned weights to {cleaned_path}")
        else:
            print(f"No 'module.' prefix found in {filename}, skipping.")

WilliamBonilla62 avatar May 30 '25 03:05 WilliamBonilla62

This is what the --parallel flag is for in inference.py

nv-jeff avatar Jun 16 '25 18:06 nv-jeff