bevfusion
bevfusion copied to clipboard
Performance on Val Set Lower than the official results
I use the newest code and change
mmdet3d/models/vtransforms/base: line 38: add_depth_features=True -> False
to make the pretrained weight work.
The code I used for test is
torchpack dist-run -np 8 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox
The result is significantly lower than the official one mAP: 68.52, NDS: 71.38.
mAP: 0.6732
mATE: 0.2868
mASE: 0.2561
mAOE: 0.3188
mAVE: 0.2513
mAAE: 0.1860
NDS: 0.7067
Eval time: 108.4s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.875 0.172 0.154 0.064 0.272 0.183
truck 0.638 0.329 0.188 0.101 0.245 0.217
bus 0.747 0.335 0.192 0.050 0.422 0.272
trailer 0.422 0.518 0.205 0.626 0.216 0.142
construction_vehicle 0.282 0.728 0.431 0.903 0.110 0.297
pedestrian 0.877 0.134 0.288 0.391 0.215 0.101
motorcycle 0.763 0.186 0.254 0.233 0.339 0.264
bicycle 0.613 0.164 0.257 0.444 0.190 0.012
traffic_cone 0.789 0.125 0.316 nan nan nan
barrier 0.726 0.178 0.276 0.058 nan nan
{'object/car_ap_dist_0.5': 0.7918, 'object/car_ap_dist_1.0': 0.8803, 'object/car_ap_dist_2.0': 0.9088, 'object/car_ap_dist_4.0': 0.9194, 'object/car_trans_err': 0.1716, 'object/car_scale_err': 0.1539, 'object/car_orient_err': 0.0638, 'object/car_vel_err': 0.2723, 'object/car_attr_err': 0.1826, 'object/mATE': 0.2868, 'object/mASE': 0.2561, 'object/mAOE': 0.3188, 'object/mAVE': 0.2513, 'object/mAAE': 0.186, 'object/truck_ap_dist_0.5': 0.4327, 'object/truck_ap_dist_1.0': 0.6391, 'object/truck_ap_dist_2.0': 0.7234, 'object/truck_ap_dist_4.0': 0.7576, 'object/truck_trans_err': 0.3287, 'object/truck_scale_err': 0.1881, 'object/truck_orient_err': 0.1007, 'object/truck_vel_err': 0.2451, 'object/truck_attr_err': 0.217, 'object/construction_vehicle_ap_dist_0.5': 0.0379, 'object/construction_vehicle_ap_dist_1.0': 0.1891, 'object/construction_vehicle_ap_dist_2.0': 0.3879, 'object/construction_vehicle_ap_dist_4.0': 0.5118, 'object/construction_vehicle_trans_err': 0.728, 'object/construction_vehicle_scale_err': 0.431, 'object/construction_vehicle_orient_err': 0.9034, 'object/construction_vehicle_vel_err': 0.1098, 'object/construction_vehicle_attr_err': 0.2974, 'object/bus_ap_dist_0.5': 0.4882, 'object/bus_ap_dist_1.0': 0.7559, 'object/bus_ap_dist_2.0': 0.8607, 'object/bus_ap_dist_4.0': 0.8844, 'object/bus_trans_err': 0.3348, 'object/bus_scale_err': 0.1916, 'object/bus_orient_err': 0.0502, 'object/bus_vel_err': 0.4225, 'object/bus_attr_err': 0.2719, 'object/trailer_ap_dist_0.5': 0.1618, 'object/trailer_ap_dist_1.0': 0.3624, 'object/trailer_ap_dist_2.0': 0.5377, 'object/trailer_ap_dist_4.0': 0.6279, 'object/trailer_trans_err': 0.5183, 'object/trailer_scale_err': 0.2053, 'object/trailer_orient_err': 0.6257, 'object/trailer_vel_err': 0.2157, 'object/trailer_attr_err': 0.142, 'object/barrier_ap_dist_0.5': 0.64, 'object/barrier_ap_dist_1.0': 0.7271, 'object/barrier_ap_dist_2.0': 0.7627, 'object/barrier_ap_dist_4.0': 0.7746, 'object/barrier_trans_err': 0.1779, 'object/barrier_scale_err': 0.2759, 'object/barrier_orient_err': 0.0578, 'object/barrier_vel_err': nan, 'object/barrier_attr_err': nan, 'object/motorcycle_ap_dist_0.5': 0.6717, 'object/motorcycle_ap_dist_1.0': 0.7783, 'object/motorcycle_ap_dist_2.0': 0.7947, 'object/motorcycle_ap_dist_4.0': 0.8061, 'object/motorcycle_trans_err': 0.186, 'object/motorcycle_scale_err': 0.254, 'object/motorcycle_orient_err': 0.233, 'object/motorcycle_vel_err': 0.3395, 'object/motorcycle_attr_err': 0.264, 'object/bicycle_ap_dist_0.5': 0.581, 'object/bicycle_ap_dist_1.0': 0.6112, 'object/bicycle_ap_dist_2.0': 0.6222, 'object/bicycle_ap_dist_4.0': 0.6367, 'object/bicycle_trans_err': 0.1643, 'object/bicycle_scale_err': 0.2571, 'object/bicycle_orient_err': 0.4443, 'object/bicycle_vel_err': 0.1903, 'object/bicycle_attr_err': 0.0115, 'object/pedestrian_ap_dist_0.5': 0.8592, 'object/pedestrian_ap_dist_1.0': 0.8731, 'object/pedestrian_ap_dist_2.0': 0.8833, 'object/pedestrian_ap_dist_4.0': 0.8935, 'object/pedestrian_trans_err': 0.1339, 'object/pedestrian_scale_err': 0.2877, 'object/pedestrian_orient_err': 0.3905, 'object/pedestrian_vel_err': 0.2151, 'object/pedestrian_attr_err': 0.1013, 'object/traffic_cone_ap_dist_0.5': 0.7681, 'object/traffic_cone_ap_dist_1.0': 0.7779, 'object/traffic_cone_ap_dist_2.0': 0.7933, 'object/traffic_cone_ap_dist_4.0': 0.8154, 'object/traffic_cone_trans_err': 0.1249, 'object/traffic_cone_scale_err': 0.3162, 'object/traffic_cone_orient_err': nan, 'object/traffic_cone_vel_err': nan, 'object/traffic_cone_attr_err': nan, 'object/nds': 0.706711898633767, 'object/map': 0.6732251261793217}
I did the same operation:
mmdet3d/models/vtransforms/base: line 38: add_depth_features=True -> False
But I got a lower result (on nuScenes_mini) as below:
mAP: 0.5780
mATE: 0.4042
mASE: 0.4474
mAOE: 0.4679
mAVE: 0.4219
mAAE: 0.3251
NDS: 0.5824
Eval time: 2.2s
Per-class results: Object Class AP ATE ASE AOE AVE AAE car 0.917 0.176 0.160 0.100 0.114 0.067 truck 0.824 0.151 0.117 0.032 0.075 0.019 bus 0.995 0.161 0.090 0.024 0.385 0.384 trailer 0.000 1.000 1.000 1.000 1.000 1.000 construction_vehicle 0.000 1.000 1.000 1.000 1.000 1.000 pedestrian 0.918 0.120 0.262 0.377 0.211 0.131 motorcycle 0.706 0.189 0.288 0.361 0.054 0.000 bicycle 0.538 0.162 0.213 0.318 0.536 0.000 traffic_cone 0.883 0.081 0.344 nan nan nan barrier 0.000 1.000 1.000 1.000 nan nan
Same problem. When I set add_depth_features=False, nds and map will be 0.7065 and 0.6730 respectively. However, when the flag is True, an error will be thrown about the shape in depth_lss.py line 39. The input channel number should be 1 but I got 6 instead. So I change line 39 to nn.Conv2d(6, 8, 1) #478, in this way, I'll get a much lower performance (nds and map are around 0.50 and 0.35 separately. Could some else help to explain what happens? @zhijian-liu
Use the old branch may help. The latest code seems to have some bugs.
Use the old branch may help. The latest code seems to have some bugs.
Hi, JinPeng. By reading the current code, I think the reason why the validation results are a little lower than the reported values maybe is owing to the mismatch between the downloaded checkpoints and the current code (BaseDepthTransform change compared to the old branch). And I also trained the LC config with add_depth_features=False but even got a lower results.
So I'm retraining the C-only Swin-T config and LC config with add_depth_features=True but change the input channel number instead. Hopefully this will lead to the correct results...
If you have any idea about how to fix the bug, please reach out to me. Thanks!
Use the old branch may help. The latest code seems to have some bugs.
Hi, JinPeng. By reading the current code, I think the reason why the validation results are a little lower than the reported values maybe is owing to the mismatch between the downloaded checkpoints and the current code (
BaseDepthTransformchange compared to the old branch). And I also trained the LC config withadd_depth_features=Falsebut even got a lower results.So I'm retraining the C-only Swin-T config and LC config with
add_depth_features=Truebut change the input channel number instead. Hopefully this will lead to the correct results...If you have any idea about how to fix the bug, please reach out to me. Thanks!
I rollback to the stable branch,and no need to set 'add_depth_features=False' or change any code , the fusion det val results of official checkpoint are good ( reach 68.7mAP & 78.4NDS). I'm also re-training the det model,and the val seems also good. Hope roolback may help you !
Use the old branch may help. The latest code seems to have some bugs.
Hi, JinPeng. By reading the current code, I think the reason why the validation results are a little lower than the reported values maybe is owing to the mismatch between the downloaded checkpoints and the current code (
BaseDepthTransformchange compared to the old branch). And I also trained the LC config withadd_depth_features=Falsebut even got a lower results. So I'm retraining the C-only Swin-T config and LC config withadd_depth_features=Truebut change the input channel number instead. Hopefully this will lead to the correct results... If you have any idea about how to fix the bug, please reach out to me. Thanks!I rollback to the stable branch,and no need to set 'add_depth_features=False' or change any code , the fusion det val results of official checkpoint are good ( reach 68.7mAP & 78.4NDS). I'm also re-training the det model,and the val seems also good. Hope roolback may help you !
Hi, could you please share the version number you used as the stable version? Thanks
Use the old branch may help. The latest code seems to have some bugs.
Hi, JinPeng. By reading the current code, I think the reason why the validation results are a little lower than the reported values maybe is owing to the mismatch between the downloaded checkpoints and the current code (
BaseDepthTransformchange compared to the old branch). And I also trained the LC config withadd_depth_features=Falsebut even got a lower results. So I'm retraining the C-only Swin-T config and LC config withadd_depth_features=Truebut change the input channel number instead. Hopefully this will lead to the correct results... If you have any idea about how to fix the bug, please reach out to me. Thanks!I rollback to the stable branch,and no need to set 'add_depth_features=False' or change any code , the fusion det val results of official checkpoint are good ( reach 68.7mAP & 78.4NDS). I'm also re-training the det model,and the val seems also good. Hope roolback may help you !
Hi, could you please share the version number you used as the stable version? Thanks
Try dev/fusion-configs https://github.com/mit-han-lab/bevfusion/tree/dev/fusion-configs
感谢您的来信,已收到