PETR Can not reproduce petr

When running [petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py), the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2s

This is much lower than the reported one. Also, we I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS.

Dec 16 '22 05:12 maggiesong7

Hi, Do you modified the batchsize or other parameters? Can you share your config.

Dec 21 '22 03:12 yingfei1016

Hi, I only change the dataset path. here is my config:

point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0] class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ] dataset_type = 'CustomNuScenesDataset' data_root = './data/nuscenes/' input_modality = dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False) file_client_args = dict(backend='disk') train_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict( type='ObjectRangeFilter', point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=True), dict( type='GlobalRotScaleTransImage', rot_range=[-0.3925, 0.3925], translation_std=[0, 0, 0], scale_ratio_range=[0.95, 1.05], reverse_angle=True, training=True), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict(type='Collect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ] test_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ] eval_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, file_client_args=dict(backend='disk')), dict( type='LoadPointsFromMultiSweeps', sweeps_num=10, file_client_args=dict(backend='disk')), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' ], with_label=False), dict(type='Collect3D', keys=['points']) ] data = dict( samples_per_gpu=1, workers_per_gpu=4, train=dict( type='CustomNuScenesDataset', data_root='./data/nuscenes/', ann_file='./data/nuscenes/nuscenes_infos_train.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict( type='ObjectRangeFilter', point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=True), dict( type='GlobalRotScaleTransImage', rot_range=[-0.3925, 0.3925], translation_std=[0, 0, 0], scale_ratio_range=[0.95, 1.05], reverse_angle=True, training=True), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='Collect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=False, box_type_3d='LiDAR', use_valid_flag=True), val=dict( type='CustomNuScenesDataset', data_root='data/nuscenes/', ann_file='data/nuscenes/nuscenes_infos_val.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=True, box_type_3d='LiDAR'), test=dict( type='CustomNuScenesDataset', data_root='data/nuscenes/', ann_file='data/nuscenes/nuscenes_infos_val.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=True, box_type_3d='LiDAR')) evaluation = dict( interval=1, pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ]) checkpoint_config = dict(interval=1) log_config = dict( interval=50, hooks=[dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = 'work_dirs/petr_r50dcn_gridmask_p4/' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' backbone_norm_cfg = dict(type='LN', requires_grad=True) plugin = True plugin_dir = 'projects/mmdet3d_plugin/' voxel_size = [0.2, 0.2, 8] img_norm_cfg = dict( mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) model = dict( type='Petr3D', use_grid_mask=True, img_backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(2, 3), frozen_stages=-1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', with_cp=True, dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), pretrained='ckpts/resnet50_msra-5891d200.pth'), img_neck=dict( type='CPFPN', in_channels=[1024, 2048], out_channels=256, num_outs=2), pts_bbox_head=dict( type='PETRHead', num_classes=10, in_channels=256, num_query=900, LID=True, with_position=True, with_multiview=True, position_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], normedlinear=False, transformer=dict( type='PETRTransformer', decoder=dict( type='PETRTransformerDecoder', return_intermediate=True, num_layers=6, transformerlayers=dict( type='PETRTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=256, num_heads=8, dropout=0.1), dict( type='PETRMultiheadAttention', embed_dims=256, num_heads=8, dropout=0.1) ], feedforward_channels=2048, ffn_dropout=0.1, with_cp=True, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')))), bbox_coder=dict( type='NMSFreeCoder', post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], max_num=300, voxel_size=[0.2, 0.2, 8], num_classes=10), positional_encoding=dict( type='SinePositionalEncoding3D', num_feats=128, normalize=True), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), loss_bbox=dict(type='L1Loss', loss_weight=0.25), loss_iou=dict(type='GIoULoss', loss_weight=0.0)), train_cfg=dict( pts=dict( grid_size=[512, 512, 1], voxel_size=[0.2, 0.2, 8], point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], out_size_factor=4, assigner=dict( type='HungarianAssigner3D', cls_cost=dict(type='FocalLossCost', weight=2.0), reg_cost=dict(type='BBox3DL1Cost', weight=0.25), iou_cost=dict(type='IoUCost', weight=0.0), pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0])))) db_sampler = dict() ida_aug_conf = dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True) optimizer = dict( type='AdamW', lr=0.0002, paramwise_cfg=dict(custom_keys=dict(img_backbone=dict(lr_mult=0.1))), weight_decay=0.01) optimizer_config = dict( type='Fp16OptimizerHook', loss_scale=512.0, grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, min_lr_ratio=0.001) total_epochs = 24 find_unused_parameters = False runner = dict(type='EpochBasedRunner', max_epochs=24) gpu_ids = range(0, 8)

Dec 22 '22 06:12 maggiesong7

Hi,

The config has no problem. Can you tell me the gpu number and the version of python and mmdet3d? Python3.8 may drops some performance.

Dec 23 '22 07:12 yingfei1016

I use 8 2080ti to train. And I have trained the model using two different python versions, that is, python 3.7.6 and python 3.6.5, both of them are with mmdet3d 1.0.0. Also, when I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS. In my opinion, setting with_position=False is just a kind of ablation study about the 3D PE module. Can you explain that?

Dec 24 '22 09:12 maggiesong7

Hi,

(1) When use mmdet1.0, have you notice here https://github.com/megvii-research/PETR/issues/71#issuecomment-1318191277 . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study.

When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.

Dec 24 '22 10:12 yingfei1016

Hi,

(1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study.

When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.

I have noticed StreamPETR still set reverse_angle=True but they use mmdet3d=1.0.0rc6, have I missed something?

Sep 12 '23 03:09 xiaosu-zhu

Hi, (1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study. When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.

I have noticed StreamPETR still set reverse_angle=True but they use mmdet3d=1.0.0rc6, have I missed something?

The rotate matrix is different.

Sep 12 '23 06:09 yingfei1016

Hi, (1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study. When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.

I have noticed StreamPETR still set reverse_angle=True but they use mmdet3d=1.0.0rc6, have I missed something?

The rotate matrix is different.

Thanks, got it. 👍

Sep 19 '23 01:09 xiaosu-zhu

https://github.com/megvii-research/PETR/issues/86#issue-1499621424

Oct 01 '23 23:10 Vendulamrdka95

When running [petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py), the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2s

This is much lower than the reported one. Also, we I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS. / /

When running [petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py), the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2s

This is much lower than the reported one. Also, we I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS.

Oct 02 '23 00:10 Vendulamrdka95

PETR PETR copied to clipboard

Can not reproduce petr

PETR
PETR copied to clipboard