PETR icon indicating copy to clipboard operation
PETR copied to clipboard

Can not reproduce petr

Open maggiesong7 opened this issue 2 years ago • 10 comments

When running [petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py), the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2s

This is much lower than the reported one. Also, we I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS.

maggiesong7 avatar Dec 16 '22 05:12 maggiesong7

Hi, Do you modified the batchsize or other parameters? Can you share your config.

yingfei1016 avatar Dec 21 '22 03:12 yingfei1016

Hi, I only change the dataset path. here is my config:

point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0] class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ] dataset_type = 'CustomNuScenesDataset' data_root = './data/nuscenes/' input_modality = dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False) file_client_args = dict(backend='disk') train_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict( type='ObjectRangeFilter', point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=True), dict( type='GlobalRotScaleTransImage', rot_range=[-0.3925, 0.3925], translation_std=[0, 0, 0], scale_ratio_range=[0.95, 1.05], reverse_angle=True, training=True), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict(type='Collect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ] test_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ] eval_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, file_client_args=dict(backend='disk')), dict( type='LoadPointsFromMultiSweeps', sweeps_num=10, file_client_args=dict(backend='disk')), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' ], with_label=False), dict(type='Collect3D', keys=['points']) ] data = dict( samples_per_gpu=1, workers_per_gpu=4, train=dict( type='CustomNuScenesDataset', data_root='./data/nuscenes/', ann_file='./data/nuscenes/nuscenes_infos_train.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict( type='ObjectRangeFilter', point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=True), dict( type='GlobalRotScaleTransImage', rot_range=[-0.3925, 0.3925], translation_std=[0, 0, 0], scale_ratio_range=[0.95, 1.05], reverse_angle=True, training=True), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='Collect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=False, box_type_3d='LiDAR', use_valid_flag=True), val=dict( type='CustomNuScenesDataset', data_root='data/nuscenes/', ann_file='data/nuscenes/nuscenes_infos_val.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=True, box_type_3d='LiDAR'), test=dict( type='CustomNuScenesDataset', data_root='data/nuscenes/', ann_file='data/nuscenes/nuscenes_infos_val.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False), test_mode=True, box_type_3d='LiDAR')) evaluation = dict( interval=1, pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipImage', data_aug_conf=dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict(type='Collect3D', keys=['img']) ]) ]) checkpoint_config = dict(interval=1) log_config = dict( interval=50, hooks=[dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = 'work_dirs/petr_r50dcn_gridmask_p4/' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' backbone_norm_cfg = dict(type='LN', requires_grad=True) plugin = True plugin_dir = 'projects/mmdet3d_plugin/' voxel_size = [0.2, 0.2, 8] img_norm_cfg = dict( mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) model = dict( type='Petr3D', use_grid_mask=True, img_backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(2, 3), frozen_stages=-1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', with_cp=True, dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), pretrained='ckpts/resnet50_msra-5891d200.pth'), img_neck=dict( type='CPFPN', in_channels=[1024, 2048], out_channels=256, num_outs=2), pts_bbox_head=dict( type='PETRHead', num_classes=10, in_channels=256, num_query=900, LID=True, with_position=True, with_multiview=True, position_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], normedlinear=False, transformer=dict( type='PETRTransformer', decoder=dict( type='PETRTransformerDecoder', return_intermediate=True, num_layers=6, transformerlayers=dict( type='PETRTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=256, num_heads=8, dropout=0.1), dict( type='PETRMultiheadAttention', embed_dims=256, num_heads=8, dropout=0.1) ], feedforward_channels=2048, ffn_dropout=0.1, with_cp=True, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')))), bbox_coder=dict( type='NMSFreeCoder', post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], max_num=300, voxel_size=[0.2, 0.2, 8], num_classes=10), positional_encoding=dict( type='SinePositionalEncoding3D', num_feats=128, normalize=True), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), loss_bbox=dict(type='L1Loss', loss_weight=0.25), loss_iou=dict(type='GIoULoss', loss_weight=0.0)), train_cfg=dict( pts=dict( grid_size=[512, 512, 1], voxel_size=[0.2, 0.2, 8], point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], out_size_factor=4, assigner=dict( type='HungarianAssigner3D', cls_cost=dict(type='FocalLossCost', weight=2.0), reg_cost=dict(type='BBox3DL1Cost', weight=0.25), iou_cost=dict(type='IoUCost', weight=0.0), pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0])))) db_sampler = dict() ida_aug_conf = dict( resize_lim=(0.8, 1.0), final_dim=(512, 1408), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True) optimizer = dict( type='AdamW', lr=0.0002, paramwise_cfg=dict(custom_keys=dict(img_backbone=dict(lr_mult=0.1))), weight_decay=0.01) optimizer_config = dict( type='Fp16OptimizerHook', loss_scale=512.0, grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, min_lr_ratio=0.001) total_epochs = 24 find_unused_parameters = False runner = dict(type='EpochBasedRunner', max_epochs=24) gpu_ids = range(0, 8)

maggiesong7 avatar Dec 22 '22 06:12 maggiesong7

Hi,

The config has no problem. Can you tell me the gpu number and the version of python and mmdet3d? Python3.8 may drops some performance.

yingfei1016 avatar Dec 23 '22 07:12 yingfei1016

I use 8 2080ti to train. And I have trained the model using two different python versions, that is, python 3.7.6 and python 3.6.5, both of them are with mmdet3d 1.0.0. Also, when I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS. In my opinion, setting with_position=False is just a kind of ablation study about the 3D PE module. Can you explain that?

maggiesong7 avatar Dec 24 '22 09:12 maggiesong7

Hi,

(1) When use mmdet1.0, have you notice here https://github.com/megvii-research/PETR/issues/71#issuecomment-1318191277 . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study. image

When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.

yingfei1016 avatar Dec 24 '22 10:12 yingfei1016

Hi,

(1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study. image

When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.

I have noticed StreamPETR still set reverse_angle=True but they use mmdet3d=1.0.0rc6, have I missed something?

xiaosu-zhu avatar Sep 12 '23 03:09 xiaosu-zhu

Hi, (1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study. image When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.

I have noticed StreamPETR still set reverse_angle=True but they use mmdet3d=1.0.0rc6, have I missed something?

The rotate matrix is different.

yingfei1016 avatar Sep 12 '23 06:09 yingfei1016

Hi, (1) When use mmdet1.0, have you notice here #71 (comment) . The reverse_angle must be False in GlobalRotScaleTransImage. (2) Yes, when set with_position=False, it's a result in ablation study. image When set with_position=False, the intrinsics and extrinsics are not used in model. In fact, PETR can work without intrinsics and extrinsics, benefiting from global attention. The low performance is mainly due to ResizeCropFlipImage and GlobalRotScaleTransImage. These data augmentation greatly change the intrinsics and extrinsics during the training process, and the network can't overfit the parameters of the data set. Once these augmentations are removed, resnet50 should obtain the peformance more than 27% mAP. But we don't think it's meaningful to over-fit the dataset.

I have noticed StreamPETR still set reverse_angle=True but they use mmdet3d=1.0.0rc6, have I missed something?

The rotate matrix is different.

Thanks, got it. 👍

xiaosu-zhu avatar Sep 19 '23 01:09 xiaosu-zhu

https://github.com/megvii-research/PETR/issues/86#issue-1499621424

Vendulamrdka95 avatar Oct 01 '23 23:10 Vendulamrdka95

When running [petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py), the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2s

This is much lower than the reported one. Also, we I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS. / /

When running [petr_r50dcn_gridmask_p4.py](https://github.com/megvii-research/PETR/blob/main/projects/configs/petr/petr_r50dcn_gridmask_p4.py), the accuracy I got was: mAP: 0.3022 mATE: 0.8507 mASE: 0.2785 mAOE: 0.6519 mAVE: 1.0027 mAAE: 0.2668 NDS: 0.3463 Eval time: 302.2s

This is much lower than the reported one. Also, we I set with_position=False, the accuracy is extremely low, which is 0.0887mAP and 0.2230NDS.


Vendulamrdka95 avatar Oct 02 '23 00:10 Vendulamrdka95