robust-detection-benchmark Question about stylized dataset improve robustness

Hi, I have trained the combined VOC2007 2012 trainval datasets on CenterNet(objects as points) with resnet50dcn backbone, the result is bellow: resnet50dcn without stylized : 75.19 resnet50dcn with combined data :72.61 the results seems conflict with the paper!

Jan 01 '20 04:01 feixiangdekaka

Hi @feixiangdekaka I assume the performance is on clean VOC07? If so I am not sure how they conflict with our results, which are mostly focussed on performance on corrupted images?

I guess you mean, that the performance of the model trained on combined data is 2% lower than the baseline compared to 0.1% as reported for our model in the paper. However your model (CenterNet) and backbone (ResNet50 with DCN/deformable convolutions) is very different from ours (Faster R-CNN with ResNet50).

The cause might be overfitting. CenterNet should perform on par or better than Faster R-CNN. The same is true when adding deformable convolutions to the backbone. However your clean performance is 5% lower than that of our model. I encountered similar issues when I tried to use more powerful models on Pascal VOC. I experimented with deeper backbones, deformable convolutions and cascade r-cnn models but the more powerful models would often perform worse than the simple Faster R-CNN baseline. The most logical explanation is overfitting. This would explain the rather strong effect you see, because overfitting will impact the combined model more as training takes twice as long and the stylized training data is quite different from the clean test images.

It would be interesting to see performance of the two models you trained under corruption (i.e. on VOC07-C). It would surprise me if the combined model performed worse there. If that would be the case we should investigate as combined training improved performance under corruption wherever we tried it until now (in the paper and beyond the paper).

Jan 01 '20 23:01 michaelisc

Testing gaussian_noise at severity 0 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 22.6 task/s, elapsed: 219s, ETA: Testing gaussian_noise at severity 1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.7 task/s, elapsed: 228s, ETA: Testing gaussian_noise at severity 2 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.7 task/s, elapsed: 229s, ETA: Testing gaussian_noise at severity 3 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.5 task/s, elapsed: 230s, ETA: Testing gaussian_noise at severity 4 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.4 task/s, elapsed: 231s, ETA: Testing gaussian_noise at severity 5 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.7 task/s, elapsed: 229s, ETA: Testing shot_noise at severity 1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.9 task/s, elapsed: 226s, ETA: Testing shot_noise at severity 2 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.7 task/s, elapsed: 228s, ETA: Testing shot_noise at severity 3 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.6 task/s, elapsed: 229s, ETA: Testing shot_noise at severity 4 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.7 task/s, elapsed: 228s, ETA: Testing shot_noise at severity 5 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.9 task/s, elapsed: 227s, ETA: Testing impulse_noise at severity 1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.5 task/s, elapsed: 230s, ETA: Testing impulse_noise at severity 2 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.3 task/s, elapsed: 232s, ETA: Testing impulse_noise at severity 3 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.3 task/s, elapsed: 232s, ETA: Testing impulse_noise at severity 4 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.1 task/s, elapsed: 235s, ETA: Testing impulse_noise at severity 5 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.1 task/s, elapsed: 235s, ETA: Testing defocus_blur at severity 1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.8 task/s, elapsed: 227s, ETA: Testing defocus_blur at severity 2 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.8 task/s, elapsed: 227s, ETA: Testing defocus_blur at severity 3 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.6 task/s, elapsed: 229s, ETA: Testing defocus_blur at severity 4 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.6 task/s, elapsed: 229s, ETA: Testing defocus_blur at severity 5 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 21.5 task/s, elapsed: 230s, ETA: Testing glass_blur at severity 1 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 0.8 task/s, elapsed: 6521s, ETA: Testing glass_blur at severity 2 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 4952/4952, 1.5 task/s, elapsed: 3272s, ETA: Testing glass_blur at severity 3 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 3399/4952, 0.5 task/s, elapsed: 6743s, ETA: 3081s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s 0s

the glass_blur costs very long time than the former corruption.

Actually, I also test your resnet50_fpn_1x_pascal_voc on the combination of clean and stylized datasets, the results is bellow: only clean: +-------------+------+-------+--------+-------+ | class | gts | dets | recall | ap | +-------------+------+-------+--------+-------+ | aeroplane | 285 | 979 | 0.926 | 0.846 | | bicycle | 337 | 1200 | 0.955 | 0.862 | | bird | 459 | 1133 | 0.913 | 0.838 | | boat | 263 | 1446 | 0.909 | 0.721 | | bottle | 469 | 2023 | 0.846 | 0.707 | | bus | 213 | 794 | 0.934 | 0.841 | | car | 1201 | 3911 | 0.964 | 0.880 | | cat | 358 | 1019 | 0.972 | 0.890 | | chair | 756 | 5556 | 0.893 | 0.654 | | cow | 244 | 997 | 0.963 | 0.853 | | diningtable | 206 | 2366 | 0.932 | 0.716 | | dog | 489 | 1416 | 0.978 | 0.881 | | horse | 348 | 1012 | 0.954 | 0.872 | | motorbike | 325 | 1240 | 0.923 | 0.831 | | person | 4528 | 13766 | 0.945 | 0.853 | | pottedplant | 480 | 2104 | 0.777 | 0.537 | | sheep | 242 | 807 | 0.942 | 0.831 | | sofa | 239 | 1599 | 0.954 | 0.774 | | train | 282 | 1047 | 0.940 | 0.836 | | tvmonitor | 308 | 1205 | 0.909 | 0.796 | +-------------+------+-------+--------+-------+ | mAP | | | | 0.801 |

corruption with stylized (0.8) and clean: +-------------+------+-------+--------+-------+ | class | gts | dets | recall | ap | +-------------+------+-------+--------+-------+ | aeroplane | 285 | 1109 | 0.926 | 0.843 | | bicycle | 337 | 1445 | 0.964 | 0.886 | | bird | 459 | 1481 | 0.917 | 0.822 | | boat | 263 | 1896 | 0.909 | 0.715 | | bottle | 469 | 2225 | 0.829 | 0.695 | | bus | 213 | 962 | 0.953 | 0.863 | | car | 1201 | 4274 | 0.961 | 0.878 | | cat | 358 | 1347 | 0.972 | 0.876 | | chair | 756 | 5900 | 0.903 | 0.682 | | cow | 244 | 1020 | 0.959 | 0.856 | | diningtable | 206 | 2643 | 0.932 | 0.757 | | dog | 489 | 1857 | 0.986 | 0.877 | | horse | 348 | 1287 | 0.960 | 0.863 | | motorbike | 325 | 1563 | 0.957 | 0.857 | | person | 4528 | 16607 | 0.953 | 0.855 | | pottedplant | 480 | 3705 | 0.842 | 0.551 | | sheep | 242 | 1008 | 0.934 | 0.828 | | sofa | 239 | 1717 | 0.958 | 0.784 | | train | 282 | 1457 | 0.947 | 0.858 | | tvmonitor | 308 | 1572 | 0.919 | 0.785 | +-------------+------+-------+--------+-------+ | mAP | | | | 0.807 |

corruption with stylized (0.8) and stylized (0.5) and clean: +-------------+------+-------+--------+-------+ | class | gts | dets | recall | ap | +-------------+------+-------+--------+-------+ | aeroplane | 285 | 939 | 0.923 | 0.830 | | bicycle | 337 | 1106 | 0.947 | 0.870 | | bird | 459 | 1233 | 0.882 | 0.784 | | boat | 263 | 1420 | 0.897 | 0.699 | | bottle | 469 | 1936 | 0.800 | 0.663 | | bus | 213 | 804 | 0.958 | 0.865 | | car | 1201 | 3892 | 0.954 | 0.874 | | cat | 358 | 1170 | 0.975 | 0.881 | | chair | 756 | 4865 | 0.852 | 0.660 | | cow | 244 | 891 | 0.955 | 0.843 | | diningtable | 206 | 1824 | 0.922 | 0.741 | | dog | 489 | 1531 | 0.967 | 0.859 | | horse | 348 | 1096 | 0.951 | 0.862 | | motorbike | 325 | 1102 | 0.935 | 0.850 | | person | 4528 | 13536 | 0.941 | 0.853 | | pottedplant | 480 | 2570 | 0.800 | 0.563 | | sheep | 242 | 1038 | 0.934 | 0.809 | | sofa | 239 | 1537 | 0.958 | 0.798 | | train | 282 | 1095 | 0.954 | 0.854 | | tvmonitor | 308 | 1501 | 0.890 | 0.758 | +-------------+------+-------+--------+-------+ | mAP | | | | 0.796 |

the performance increase when combination stylized(0.8) and clean. and the performance decrease when combination stylized(0.5) and stylized(0.8) and clean. but it is notice that, the roubustness increase with more stylized data.

Besides, how to fix the over fitting problem ? thanks!

Jan 04 '20 06:01 feixiangdekaka

Hello, I am a beginner. Could you please tell me how to replace the backbone in the code with the backbone I want to test? Thank u,

May 11 '21 10:05 milkgodzzz

robust-detection-benchmark robust-detection-benchmark copied to clipboard

Question about stylized dataset improve robustness

robust-detection-benchmark
robust-detection-benchmark copied to clipboard