bevfusion icon indicating copy to clipboard operation
bevfusion copied to clipboard

Error occurrence in evaluating pretrained models,about the bevfusion-det.pth.

Open fdy61 opened this issue 1 year ago • 19 comments

when I run the command of "torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth",when loading lidar-only-det.pth,it will say "The model and loaded state dict do not match exactly" image and there will be a RuntimeError: Given groups=1, weight of size [8, 1, 1, 1], expected input[24, 6, 256, 704] to have 1 channels, but got 6 channels instead, image How can I solve it?

fdy61 avatar Feb 10 '24 09:02 fdy61

I use the command given by the author in the github, image

fdy61 avatar Feb 10 '24 09:02 fdy61

请问您解决了吗?办法是什么呢?

GZF123 avatar Mar 03 '24 11:03 GZF123

请问您解决了吗?办法是什么呢?

我使用了bevfusion-det.pth没有问题

yuanjiechen avatar Mar 18 '24 06:03 yuanjiechen

No problem.I had Solved it.

fdy61 avatar Mar 18 '24 07:03 fdy61

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

wyy032 avatar Apr 26 '24 06:04 wyy032

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

fdy61 avatar Apr 26 '24 06:04 fdy61

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

wyy032 avatar Apr 26 '24 06:04 wyy032

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

fdy61 avatar Apr 26 '24 06:04 fdy61

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

Weixin Image_20240426173549 I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

wyy032 avatar Apr 26 '24 09:04 wyy032

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

Weixin Image_20240426173549 I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

This is just a warning, not an error. Because the entire BEVFusion detection model has the camera branch, but lidar-only-det.pth dosen't. The camera branch is loaded from swint-nuimages-pretrained.pth. So you just run the order and it's OK. You will find out that the model will train successfully.

fdy61 avatar Apr 26 '24 09:04 fdy61

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

Weixin Image_20240426173549 I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

This is just a warning, not an error. Because the entire BEVFusion detection model has the camera branch, but lidar-only-det.pth dosen't. The camera branch is loaded from swint-nuimages-pretrained.pth. So you just run the order and it's OK. You will find out that the model will train successfully.

Thank you for your patience. But I was able to train it successfully, the main problem is still the same as the one at the beginning, the accuracy is not high and very different from the results of the paper. nuScenes-full ran three rounds with a single card A100 first and the NDS was always at 0.46, the original paper NDS=0.7288, which is too much difference. About that I have not been able to solve.

wyy032 avatar Apr 26 '24 10:04 wyy032

3 epochs?That is far from enough. You can learn more details from the training strategy

fdy61 avatar Apr 26 '24 10:04 fdy61

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

fdy61 avatar Apr 26 '24 10:04 fdy61

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

wyy032 avatar Apr 26 '24 10:04 wyy032

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

fdy61 avatar Apr 26 '24 10:04 fdy61

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

Ok, thank you very much for your advice, I'll try again.

wyy032 avatar Apr 26 '24 11:04 wyy032

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

Ok, thank you very much for your advice, I'll try again.

Hi, I've run six complete rounds in the last few days, but the accuracy still doesn't go up, I suspect it's a cuda version issue and no multi-card training. Can I ask which cuda version you are using?

wyy032 avatar Apr 29 '24 02:04 wyy032

11.3 or 11.1 @wyy032

fdy61 avatar Apr 29 '24 04:04 fdy61

11.3 or 11.1 @wyy032

ok,thankyou,I'll try again.

wyy032 avatar Apr 29 '24 04:04 wyy032

Ничего. Я решил ее.

Как вы решили такую ошибку? У меня такая же

Ange1ika avatar May 05 '24 12:05 Ange1ika

No problem.I had Solved it.

How do you solve it? Wait for your reply. Thank you. @fdy61

zhujiagang avatar Jun 20 '24 08:06 zhujiagang

@fdy61 Hi, when you test the fusion model using the official commands can you achieve thesis accuracy? torchpack dist-run -np 1 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained /bevfusion-det.pth --eval bbox When I test the laser-only and camera-only models, I get mAP=0.6468, NDS=0.6924 and mAP=0.3554, NDS=0.4121, which is very similar to the paper results, but when I test the fusion model the results are only mAP=0.6728, NDS=0.7069, what could be the reason?

zyqww avatar Jul 08 '24 09:07 zyqww