fasterrcnn-pytorch-training-pipeline icon indicating copy to clipboard operation
fasterrcnn-pytorch-training-pipeline copied to clipboard

Cannot load the weights after training with custom data

Open PhucPhamSy opened this issue 10 months ago • 9 comments

Hi, after train model with distributed training method. I cannot load the weight name "last_model_state.pth". I trained with model name "fasterrcnn_resnet50_fpn"

Traceback (most recent call last): File "/raid/data/phuc/detection/fastercnn-pytorch-training-pipeline/test_load_weight.py", line 37, in model.load_state_dict(checkpoint['model_state_dict']) File "/home/ubuntu/anaconda3/envs/fastercnn/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for FasterRCNN: Missing key(s) in state_dict:

PhucPhamSy avatar Apr 11 '24 07:04 PhucPhamSy

May I know which version of PyTorch you are using?

sovit-123 avatar Apr 11 '24 12:04 sovit-123

May I know which version of PyTorch you are using?

I am using pytorch 2.0.1

PhucPhamSy avatar Apr 11 '24 12:04 PhucPhamSy

Can you please paste the command that you are using?

sovit-123 avatar Apr 11 '24 12:04 sovit-123

  1. For training: export CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --data data_configs/smoke.yaml --epochs 100 --model fasterrcnn_resnet50_fpn --name smoke_training --batch 16
  2. For inference: python inference.py --input data/inference_data/image_1.jpg --weights outputs/training/smoke_training/last_model_state.pth

PhucPhamSy avatar Apr 11 '24 12:04 PhucPhamSy

But when I trained with only 1 GPU (not distributed train). I can use the command for inference above. Could you check the inference command for distributed train?

PhucPhamSy avatar Apr 11 '24 12:04 PhucPhamSy

Oh. Interesting. I need some time to debug in this case.

sovit-123 avatar Apr 11 '24 13:04 sovit-123

Many tks. I will wait your reply!

PhucPhamSy avatar Apr 11 '24 13:04 PhucPhamSy

Hi, thanks a lot for such a nice repo! Is there any update for this issue? It's not working for eval.py as well in that situation.

ahmetoguzsaltik avatar May 21 '24 11:05 ahmetoguzsaltik

Hello. Apologies for the delayed response. Recently I have not been able to put much time into the project due to time related constraints. However, I plan to resume again soon. I will be putting up a project page to work on the relevant issues and also new features soon.

Happy to receive feature recommendations from today.

sovit-123 avatar May 21 '24 14:05 sovit-123