TransFusion
TransFusion copied to clipboard
Have you tried TTA (e.g. double flip)?
If not, is that TTA non-trivial for fusion-based method? Any insight?
I did not use any TTA for nuScenes submission. I think for fusion-based methods, you could augment either the 3D part or the 2D part, following the respective common practices. For example, double flip, rotation, and scale over point clouds; flip over images. All these augmentations will be inverse transformed when building the correspondence between two modalities so it will not break the consistency. Afterwards, the way to fuse the detection results from multiple augmentated inputs should be carefully chosen. I have tried some TTA but have not observed significant performance improvement, maybe due to the simple bbox fusion strategy I adopted.
Thanks for the response. I've tested a couple of them, and it unfortunately seems to deteriorate the performance (-1.2% drop on val). Tweaks on bbox fusion even harm it further. If my code is right, I may need some more hyperparameter tuning. It will be exciting to make it work after all.
I have tried performing NMS with 3D IoU to fuse the boxes, and the result is: two augmentation strategies from 1) double flip 2) rotation and 3)scale are able to increase performance (but not very significant), sorry I donot remember exactly which one will deteriorate the performance. Maybe you can try https://github.com/ZFTurbo/Weighted-Boxes-Fusion
Hi, @XuyangBai Thank you for the insight. I have some questions regarding the NMS applied here. Have you tried the NMS provided by the MMdetection3D? Is that because they are not excellent in improving the performance so that you turned to weighted boxes fusion? Also, can you elaborate on how to use weighted boxes fusion in the TransFusion repository? To be more specific, were you adding a class under mmdet3d/core/post-processing and modifying the test_cfg and test_pipeline? Thank you so much.
after rotation tta( 6 branches) and wbf, got better result:
mAP: 0.7068 mATE: 0.2792 mASE: 0.2466 mAOE: 0.2807 mAVE: 0.2277 mAAE: 0.1764 NDS: 0.7323 Eval time: 91.5s
Per-class results: Object Class AP ATE ASE AOE AVE AAE car 0.895 0.167 0.146 0.085 0.210 0.175 truck 0.678 0.307 0.176 0.060 0.208 0.223 bus 0.793 0.307 0.169 0.041 0.366 0.219 trailer 0.528 0.468 0.202 0.533 0.194 0.156 construction_vehicle 0.315 0.759 0.426 0.828 0.122 0.292 pedestrian 0.896 0.121 0.273 0.350 0.201 0.089 motorcycle 0.801 0.183 0.231 0.234 0.373 0.246 bicycle 0.676 0.161 0.252 0.358 0.148 0.011 traffic_cone 0.773 0.115 0.309 nan nan nan barrier 0.712 0.203 0.280 0.038 nan nan
after rotation tta( 6 branches) and wbf, got better result:
mAP: 0.7068 mATE: 0.2792 mASE: 0.2466 mAOE: 0.2807 mAVE: 0.2277 mAAE: 0.1764 NDS: 0.7323 Eval time: 91.5s
Per-class results: Object Class AP ATE ASE AOE AVE AAE car 0.895 0.167 0.146 0.085 0.210 0.175 truck 0.678 0.307 0.176 0.060 0.208 0.223 bus 0.793 0.307 0.169 0.041 0.366 0.219 trailer 0.528 0.468 0.202 0.533 0.194 0.156 construction_vehicle 0.315 0.759 0.426 0.828 0.122 0.292 pedestrian 0.896 0.121 0.273 0.350 0.201 0.089 motorcycle 0.801 0.183 0.231 0.234 0.373 0.246 bicycle 0.676 0.161 0.252 0.358 0.148 0.011 traffic_cone 0.773 0.115 0.309 nan nan nan barrier 0.712 0.203 0.280 0.038 nan nan
what's mean about rotation tta( 6 branches)? @minrui-hust
Test Time Augmentation(TTA), rotate input in 6 angle, get output boxes, rotate them back, fusion them with weighted box fusion(WBF)
Test Time Augmentation(TTA), rotate input in 6 angle, get output boxes, rotate them back, fusion them with weighted box fusion(WBF)
thanks! can you share the WBF code, there is not any implementation of yaw angle version in https://github.com/ZFTurbo/Weighted-Boxes-Fusion
Sorry I can not share the code cause it is deeply coupled with our company's code base. But fusion heading is not so hard work. Only thing should take care is invert heading when heading difference is larger than 180
Sorry I can not share the code cause it is deeply coupled with our company's code base. But fusion heading is not so hard work. Only thing should take care is invert heading when heading difference is larger than 180
Test Time Augmentation(TTA), rotate input in 6 angle, get output boxes, rotate them back, fusion them with weighted box fusion(WBF)
Is the rotation just for point clouds or does that include images,and can you tell me what hyperparameters have been changed?