PETR icon indicating copy to clipboard operation
PETR copied to clipboard

About the details of pre-trained VoVNet checkpoints

Open zen-d opened this issue 2 years ago • 5 comments

Hello @yingfei1016 , I am a little confused about the pre-trained VoVNet checkpoints.

This link provided by DETR3D provides a dd3d_det_final.pth, that is first trained on DDAD15M, then trained on nuScenes train+val (if I understand correctly).

Then what are the training details of the link provided by your work? As indicated by its name " fcos3d_vovnet_imgbackbone-remapped.pth", is it trained with FCOS3D? Is it only trained on nuScenes train set, without val set? If this checkpoint is exactly trained with FCOS3D as stated above, then what is the rationale for this choice? I guess the performance would be no better than adopting the one pretrained by DD3D as DETR3D.

Thanks in advance.

zen-d avatar Dec 04 '22 10:12 zen-d

@yingfei1016 Could you please explain a bit more? Thanks.

zen-d avatar Dec 09 '22 03:12 zen-d

Hi, You can refer to https://github.com/megvii-research/PETR/issues/75#issuecomment-1323666167. The result on val set with "dd3d_det_final.pth" may slight higher than the test result, which is overfitting and abnormal.

"fcos3d_vovnet_imgbackbone-remapped.pth" is trained with FCOS3D in 24epochs and 24epochs finetuning. You can refer to https://github.com/open-mmlab/mmdetection3d/tree/master/configs/pgd for finetuning.

yingfei1016 avatar Dec 09 '22 03:12 yingfei1016

@yingfei1016 Thanks for your kind reply! I am clear about the current pre-training settings. By the way, to evaluate on the nuScenes val set, have you ever tried this pre-training setting: first pretrained on DDAD15M dataset and further trained on nuScenes train set with DD3D? Would it be better or worse than fcos3d_vovnet_imgbackbone-remapped.pth?

zen-d avatar Dec 09 '22 11:12 zen-d

@yingfei1016 Thanks for your kind reply! I am clear about the current pre-training settings. By the way, to evaluate on the nuScenes val set, have you ever tried this pre-training setting: first pretrained on DDAD15M dataset and further trained on nuScenes train set with DD3D? Would it be better or worse than fcos3d_vovnet_imgbackbone-remapped.pth?

Hi, (1)I haven‘t pretrained on nuScenes train set with DD3D. (2)I have compared the performance of pretrained weights on PGD and FCOS3D. The performance of pretrain on PGD will be slightly worse than FCOS3D. Therefore, I don't think that using a stronger depth estimation network can improve the performance. (3)In fact, the pretrained model improve the performance of location little. In my opinion, the semantics of features have been more comprehensively trained in the pretraining process. The prediction of the object category will be more accurate and the promotion mainly comes from mAP and mAOE.

yingfei1016 avatar Dec 10 '22 11:12 yingfei1016

Thanks a lot for sharing these experiences! I have the same guess that depth pre-training is not the answer to pre-training but I am not sure previously, since it might be contradictory to the claim of DD3D to some extent. Your experimental results definitely support the superiority of semantic pre-training to depth pre-training.

zen-d avatar Dec 14 '22 13:12 zen-d