bevfusion KeyError: "BEVFusion: 'encoders.camera.backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias

I use my own lidar-only and camera-only pth to train the fusion model,and encountered this problem,How can I solve it?

Feb 13 '24 10:02 fdy61

Pls did you solve it, I also had the same problem

Apr 26 '24 13:04 xmutyjs

I faced the same problem, when trying to use the new saved checkpoints from training a cam-only centerhead detector. I am not sure if it has anything to do with the centerhead being different the transfusion.

Let me be specific.

I first train a new cam-only detector using the following:

torchpack dist-run -np 1 python tools/train.py
configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml
--model.encoders.camera.backbone.init_cfg.checkpoint pretained/swint-nuimages-pretrained.pth

the checkpoints of (1) are saved into runs/ folder
Then now I try to train a cam+lidar detector using one of the saved checkpoints in (2):

torchpack dist-run -np 1 python tools/train.py \ configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml
--model.encoders.camera.backbone.init_cfg.checkpoint runs/run-531bf67d-d3138be2/epoch_20.pth
--load_from pretrained/lidar-only-det.pth

And the errors are as follow:

Traceback (most recent call last): File "tools/train.py", line 68, in main model = build_model(cfg.model) File "/home/bevfusion/mmdet3d/models/builder.py", line 41, in build_model return build_fusion_model(cfg, train_cfg=train_cfg, test_cfg=test_cfg) File "/home/bevfusion/mmdet3d/models/builder.py", line 35, in build_fusion_model return FUSIONMODELS.build( File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') KeyError: "BEVFusion: 'encoders.camera.backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table'"

May 28 '24 04:05 kongwah

The model you trained in (1) means you obtain a single-modality model which just uses the camera images to do the object detection. That means you cannot use it as the camera backbone to train the fusion model.

May 28 '24 12:05 fdy61

Hi fdy61,

I notice the same "pretained/swint-nuimages-pretrained.pth" was used as a checkpoint for training the cam-only detector, and also as a checkpoint to train the cam+lidar detector.

This is why I was of the impression that the saved cam-only checkpoints "runs/run-531bf67d-d3138be2/epoch_20.pth" will be usable to train the cam+lidar detector.

May I have your kind advise, how then can I train an appropriate camera backbone to train the cam+lidar fusion model?

Thanks

May 28 '24 15:05 kongwah

pretained/swint-nuimages-pretrained.pth is just a image backbone model, if I remember correctly，which has the same structure compared with SwinTransformer. But runs/run-531bf67d-d3138be2/epoch_20.pth is an entire model, and its' weights parameter have been changed completely. you can print out the model weights name to see.

May 28 '24 15:05 fdy61

Hi fdy61,

Thanks for your tips. Indeed when I "ls -l" the 2 files, they are very different:

ls -l pretrained/swint-nuimages-pretrained.pth run/run-531bf67d-d3138be2/epoch_20.pth -rw-r--r-- 1 root root 110370759 Sep 26 2022 pretrained/swint-nuimages-pretrained.pth -rw-r--r-- 1 root root 523728374 May 27 03:45 run/run-531bf67d-d3138be2_epoch_20.pth

I am not sure how else can I print examine their "model weight name". Can you kindly advise me?

So the question then becomes how do I re-train the cam+lidar fusion model, using my new image data? I believe you may have the same goal, when you mentioned in your first post "I use my own lidar-only and camera-only pth to train the fusion model....". You also have the camera-only pth checkpoint? Did you manage to solve it? If so, can you advise me please?

Thanks

May 28 '24 17:05 kongwah

Can I do the following instead:

torchpack dist-run -np 8
python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml
--model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
--load_from runs/run-531bf67d-d3138be2/epoch_20.pth
--load_from pretrained/lidar-only-det.pth

That is, have 2 "--load_from", loading both the cam-only checkpoint and the lidar-only checkpoint.

Thanks

May 29 '24 03:05 kongwah

torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth

May 29 '24 08:05 fdy61

After some debugging, it seems that the issue is that the state_dict keys for

  pretrained/swint-nuimages-pretrained.pth

differ from that of

runs/run-531bf67d-d3138be2/epoch_20.pth

by the prefix "encoders.camera.backbone".

I am hoping that if I change the key to skip this prefix, then this error will go away!

May 29 '24 10:05 kongwah

You just use the pretrained/swint-nuimages-pretrained.pth and it's done. I have told you that runs/run-531bf67d-d3138be2/epoch_20.pth is the camera-only model, which has been trained only based on images and state_dict keys totally different from pretrained/swint-nuimages-pretrained.pth.

May 29 '24 10:05 fdy61

Hi fdy61, Thanks. The background is that I have trained the camera-only model, and using the "runs/run-531bf67d-d3138be2/epoch_20.pth" checkpoint, I have obtained mAP improvements over the "pretrained/camera-only-det.pth"

Hence, my thinking is to use this "epoch_20.pth" checkpoint to train the cam+lidar fusion model.

OK, I will try to check/confirm if the state_dict keys of "epoch_20.pth" are totally different from the "camera-only-det.pth", or if they only differ by the prefix "encoders.camera.backbone". I will post the update here.

Thanks

May 29 '24 10:05 kongwah

Hi fdy61,

Since we have to use pretrained/swint-nuimages-pretrained.pth to train the C+L fusion model, do you have an idea of how the pretrained model was trained? Alternatively, is there any way we can replicate the training process for the pretrained models?

Thanks~

Jun 24 '24 03:06 Ruvennsiow

bevfusion bevfusion copied to clipboard

KeyError: "BEVFusion: 'encoders.camera.backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table'"

bevfusion
bevfusion copied to clipboard