bevfusion Error occurrence in evaluating pretrained models，about the bevfusion-det.pth.

when I run the command of "torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth",when loading lidar-only-det.pth,it will say "The model and loaded state dict do not match exactly" and there will be a RuntimeError: Given groups=1, weight of size [8, 1, 1, 1], expected input[24, 6, 256, 704] to have 1 channels, but got 6 channels instead, How can I solve it?

Feb 10 '24 09:02 fdy61

I use the command given by the author in the github,

Feb 10 '24 09:02 fdy61

请问您解决了吗？办法是什么呢？

Mar 03 '24 11:03 GZF123

请问您解决了吗？办法是什么呢？

我使用了bevfusion-det.pth没有问题

Mar 18 '24 06:03 yuanjiechen

No problem.I had Solved it.

Mar 18 '24 07:03 fdy61

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

Apr 26 '24 06:04 wyy032

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Apr 26 '24 06:04 fdy61

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Apr 26 '24 06:04 wyy032

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

Apr 26 '24 06:04 fdy61

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

Weixin Image_20240426173549 I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

Apr 26 '24 09:04 wyy032

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

This is just a warning, not an error. Because the entire BEVFusion detection model has the camera branch, but lidar-only-det.pth dosen't. The camera branch is loaded from swint-nuimages-pretrained.pth. So you just run the order and it's OK. You will find out that the model will train successfully.

Apr 26 '24 09:04 fdy61

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

This is just a warning, not an error. Because the entire BEVFusion detection model has the camera branch, but lidar-only-det.pth dosen't. The camera branch is loaded from swint-nuimages-pretrained.pth. So you just run the order and it's OK. You will find out that the model will train successfully.

Thank you for your patience. But I was able to train it successfully, the main problem is still the same as the one at the beginning, the accuracy is not high and very different from the results of the paper. nuScenes-full ran three rounds with a single card A100 first and the NDS was always at 0.46, the original paper NDS=0.7288, which is too much difference. About that I have not been able to solve.

Apr 26 '24 10:04 wyy032

3 epochs?That is far from enough. You can learn more details from the training strategy

Apr 26 '24 10:04 fdy61

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

Apr 26 '24 10:04 fdy61

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Apr 26 '24 10:04 wyy032

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

Apr 26 '24 10:04 fdy61

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

Ok, thank you very much for your advice, I'll try again.

Apr 26 '24 11:04 wyy032

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

Ok, thank you very much for your advice, I'll try again.

Hi, I've run six complete rounds in the last few days, but the accuracy still doesn't go up, I suspect it's a cuda version issue and no multi-card training. Can I ask which cuda version you are using?

Apr 29 '24 02:04 wyy032

11.3 or 11.1 @wyy032

Apr 29 '24 04:04 fdy61

11.3 or 11.1 @wyy032

ok,thankyou,I'll try again.

Apr 29 '24 04:04 wyy032

Ничего. Я решил ее.

Как вы решили такую ошибку? У меня такая же

May 05 '24 12:05 Ange1ika

No problem.I had Solved it.

How do you solve it? Wait for your reply. Thank you. @fdy61

Jun 20 '24 08:06 zhujiagang

@fdy61 Hi, when you test the fusion model using the official commands can you achieve thesis accuracy? torchpack dist-run -np 1 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained /bevfusion-det.pth --eval bbox When I test the laser-only and camera-only models, I get mAP=0.6468, NDS=0.6924 and mAP=0.3554, NDS=0.4121, which is very similar to the paper results, but when I test the fusion model the results are only mAP=0.6728, NDS=0.7069, what could be the reason?

Jul 08 '24 09:07 zyqww

bevfusion bevfusion copied to clipboard

Error occurrence in evaluating pretrained models，about the bevfusion-det.pth.

bevfusion
bevfusion copied to clipboard