MapTR icon indicating copy to clipboard operation
MapTR copied to clipboard

increase point_cloud_range

Open bitwangdan opened this issue 1 year ago • 25 comments

hi , thanks for your great work! I want to increase the range, [-15.0, -30.0, -2.0, 60.0, 30.0, 2.0], I modified "MapTRNMSFreeCoder" and point_cloud_range, but the result is not very good, can you give some suggestions?

bitwangdan avatar Feb 28 '23 09:02 bitwangdan

@bitwangdan Hi, I had the same problem before. There is a bug when you use asymmetric range, because the patch_size of LocalMap is (w/2, h/2), but the origin is still (0, 0). You can check the code below.

https://github.com/hustvl/MapTR/blob/3b0c7d6b634193657023ebf755f6628b08100806/projects/mmdet3d_plugin/datasets/nuscenes_map_dataset.py#L85

https://github.com/hustvl/MapTR/blob/3b0c7d6b634193657023ebf755f6628b08100806/projects/mmdet3d_plugin/datasets/nuscenes_map_dataset.py#L828

zxczrx123 avatar Mar 06 '23 08:03 zxczrx123

@zxczrx123 thanks,I have found this problem, and there is one more point to pay attention to, the parameter num_vec needs to be increased

bitwangdan avatar Mar 06 '23 08:03 bitwangdan

@bitwangdan By the way, the increase in range greatly increases the complexity of the instance, and the method with fixed num vecs seems to be difficult to deal with lines of different lengths.

zxczrx123 avatar Mar 06 '23 08:03 zxczrx123

@zxczrx123 Yes, when I increase the data range, many category indicators have dropped significantly. I have not found a better way except to increase the parameter num_vec。

bitwangdan avatar Mar 06 '23 09:03 bitwangdan

@zxczrx123 Hi, have you tried adding temporal features like bevformer?the result of my experiment is not very good

bitwangdan avatar Mar 08 '23 01:03 bitwangdan

@LegendBC Hi, I have added temporal feature like bevformer,the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

bitwangdan avatar Mar 09 '23 06:03 bitwangdan

@LegendBC Hi, I have added temporal feature like bevformer,the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

LegendBC avatar Mar 10 '23 01:03 LegendBC

@LegendBC Thank you for your reply, I have added the lidar information, and the mAP in my dataset have improved a lot,when i add temporal information like bevformer, the mAP drops a lot, maybe this temporal fusion method is not suitable for MapTR, I will try other temporal methods and also hope that you can find a suitable temporal method for MapTR

bitwangdan avatar Mar 10 '23 01:03 bitwangdan

@zxczrx123 Hi, have you tried adding temporal features like bevformer?the result of my experiment is not very good

I have not used temporal features. Can you share your results?

zxczrx123 avatar Mar 10 '23 13:03 zxczrx123

@zxczrx123 hi,I experimented on my own dataset, The indicators dropped a lot

bitwangdan avatar Mar 13 '23 06:03 bitwangdan

@zxczrx123 hi, have you ever encountered such a situation? After increasing the point_cloud_range, mAP drops a lot under the threshold 0.5, but the threshold of 1.0 and 1.5 is basically normal.

bitwangdan avatar Mar 23 '23 00:03 bitwangdan

@bitwangdan My phenomenon is that it will all drop.

zxczrx123 avatar Apr 12 '23 02:04 zxczrx123

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error? image

zx2624 avatar Apr 26 '23 02:04 zx2624

@zx2624 Debug: TORCH_DISTRIBUTED_DEBUG=DETAIL bash tools/dist_train.sh **config I adjusted this parameter (rotate_center), but still wrong result

bitwangdan avatar Apr 26 '23 02:04 bitwangdan

@zx2624 Debug: TORCH_DISTRIBUTED_DEBUG=DETAIL bash tools/dist_train.sh **config I adjusted this parameter (rotate_center), but still wrong result

any code?

zx2624 avatar Apr 26 '23 08:04 zx2624

same problem. here is my cfg base = [ '../datasets/custom_nus-3d.py', '../base/default_runtime.py' ]

plugin = True plugin_dir = 'projects/mmdet3d_plugin/'

If point cloud range is changed, the models should also change their point

cloud range accordingly

point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]

point_cloud_range = [-15.0, -60.0, -2.0, 15.0, 60.0, 2.0] voxel_size = [0.15, 0.15, 4]

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

For nuScenes we usually do 10-class detection

class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]

map has classes: divider, ped_crossing, boundary

map_classes = ['divider', 'ped_crossing','boundary']

fixed_ptsnum_per_line = 20

map_classes = ['divider',]

fixed_ptsnum_per_gt_line = 40 # now only support fixed_pts > 0 fixed_ptsnum_per_pred_line = 40 eval_use_same_gt_sample_num_flag=True num_map_classes = len(map_classes)

input_modality = dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=True)

dim = 256 pos_dim = dim//2 ffn_dim = dim*2 num_levels = 1

bev_h_ = 50

bev_w_ = 50

bev_h_ = 400 bev_w_ = 100 queue_length = 1 # each sequence contains queue_length frames.

model = dict( type='MapTR', use_grid_mask=True, video_test_mode=False, pretrained=dict(img='ckpts/resnet50-19c8e357.pth'), img_backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(3,), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='pytorch'), img_neck=dict( type='FPN', in_channels=[2048], out_channels=dim, start_level=0, add_extra_convs='on_output', num_outs=num_levels, relu_before_extra_convs=True), pts_bbox_head=dict( type='MapTRHead', bev_h=bev_h_, bev_w=bev_w_, num_query=900, num_vec=100, num_pts_per_vec=fixed_ptsnum_per_pred_line, # one bbox num_pts_per_gt_vec=fixed_ptsnum_per_gt_line, dir_interval=1, query_embed_type='instance_pts', transform_method='minmax', gt_shift_pts_pattern='v2', num_classes=num_map_classes, in_channels=dim, sync_cls_avg_factor=True, with_box_refine=True, as_two_stage=False, code_size=2, code_weights=[1.0, 1.0, 1.0, 1.0], transformer=dict( type='MapTRPerceptionTransformer', rotate_prev_bev=True, use_shift=True, use_can_bus=True, embed_dims=dim, encoder=dict( type='BEVFormerEncoder', num_layers=1, pc_range=point_cloud_range, num_points_in_pillar=4, return_intermediate=False, transformerlayers=dict( type='BEVFormerLayer', attn_cfgs=[ dict( type='TemporalSelfAttention', embed_dims=dim, num_levels=1), dict( type='GeometrySptialCrossAttention', pc_range=point_cloud_range, attention=dict( type='GeometryKernelAttention', embed_dims=dim, num_heads=4, dilation=1, kernel_size=(3,5), num_levels=num_levels), embed_dims=dim, ) ], feedforward_channels=ffn_dim, ffn_dropout=0.1, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm'))), decoder=dict( type='MapTRDecoder', num_layers=6, return_intermediate=True, transformerlayers=dict( type='DetrTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=dim, num_heads=8, dropout=0.1), dict( type='CustomMSDeformableAttention', embed_dims=dim, num_levels=1), ],

                feedforward_channels=_ffn_dim_,
                ffn_dropout=0.1,
                operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
                                 'ffn', 'norm')))),
    bbox_coder=dict(
        type='MapTRNMSFreeCoder',
        # post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
        post_center_range=[-20, -65, -20, -65, 20, 65, 20, 65],
        pc_range=point_cloud_range,
        max_num=50,
        voxel_size=voxel_size,
        num_classes=num_map_classes),
    positional_encoding=dict(
        type='LearnedPositionalEncoding',
        num_feats=_pos_dim_,
        row_num_embed=bev_h_,
        col_num_embed=bev_w_,
        ),
    loss_cls=dict(
        type='FocalLoss',
        use_sigmoid=True,
        gamma=2.0,
        alpha=0.25,
        loss_weight=2.0),
    loss_bbox=dict(type='L1Loss', loss_weight=0.0),
    loss_iou=dict(type='GIoULoss', loss_weight=0.0),
    loss_pts=dict(type='PtsL1Loss', 
                  loss_weight=5.0),
    loss_dir=dict(type='PtsDirCosLoss', loss_weight=0.005)),
# model training and testing settings
train_cfg=dict(pts=dict(
    grid_size=[512, 512, 1],
    voxel_size=voxel_size,
    point_cloud_range=point_cloud_range,
    out_size_factor=4,
    assigner=dict(
        type='MapTRAssigner',
        cls_cost=dict(type='FocalLossCost', weight=2.0),
        reg_cost=dict(type='BBoxL1Cost', weight=0.0, box_format='xywh'),
        # reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
        # iou_cost=dict(type='IoUCost', weight=1.0), # Fake cost. This is just to make it compatible with DETR head.
        iou_cost=dict(type='IoUCost', iou_mode='giou', weight=0.0),
        pts_cost=dict(type='OrderedPtsL1Cost', 
                  weight=5),
        pc_range=point_cloud_range))))

dataset_type = 'CustomNuScenesLocalMapDataset' data_root = 'data/nuscenes/' file_client_args = dict(backend='disk')

train_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectNameFilter', classes=class_names), dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='RandomScaleImageMultiViewImage', scales=[0.5]), dict(type='PadMultiViewImage', size_divisor=32), dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ]

test_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='NormalizeMultiviewImage', **img_norm_cfg),

dict(
    type='MultiScaleFlipAug3D',
    img_scale=(1600, 900),
    pts_scale_ratio=1,
    flip=False,
    transforms=[
        dict(type='RandomScaleImageMultiViewImage', scales=[0.5]),
        dict(type='PadMultiViewImage', size_divisor=32),
        dict(
            type='DefaultFormatBundle3D',
            class_names=class_names,
            with_label=False),
        dict(type='CustomCollect3D', keys=['img'])
    ])

]

data = dict( samples_per_gpu=2, workers_per_gpu=4, train=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'nuscenes_infos_temporal_train.pkl', pipeline=train_pipeline, classes=class_names, modality=input_modality, test_mode=False, use_valid_flag=True, bev_size=(bev_h_, bev_w_), pc_range=point_cloud_range, fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line, eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag, padding_value=-10000, map_classes=map_classes, queue_length=queue_length, # we use box_type_3d='LiDAR' in kitti and nuscenes dataset # and box_type_3d='Depth' in sunrgbd and scannet dataset. box_type_3d='LiDAR'), val=dict(type=dataset_type, data_root=data_root, ann_file=data_root + 'nuscenes_infos_temporal_val.pkl', map_ann_file=data_root + 'nuscenes_map_anns_val.json', pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), pc_range=point_cloud_range, fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line, eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag, padding_value=-10000, map_classes=map_classes, classes=class_names, modality=input_modality, samples_per_gpu=1), test=dict(type=dataset_type, data_root=data_root, ann_file=data_root + 'nuscenes_infos_temporal_val.pkl', map_ann_file=data_root + 'nuscenes_map_anns_val.json', pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), pc_range=point_cloud_range, fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line, eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag, padding_value=-10000, map_classes=map_classes, classes=class_names, modality=input_modality), shuffler_sampler=dict(type='DistributedGroupSampler'), nonshuffler_sampler=dict(type='DistributedSampler') )

optimizer = dict( type='AdamW', lr=6e-4, paramwise_cfg=dict( custom_keys={ 'img_backbone': dict(lr_mult=0.1), }), weight_decay=0.01)

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, min_lr_ratio=1e-3) total_epochs = 24

total_epochs = 50

evaluation = dict(interval=1, pipeline=test_pipeline)

evaluation = dict(interval=2, pipeline=test_pipeline, metric='chamfer')

runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)

log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook') ]) fp16 = dict(loss_scale=512.) checkpoint_config = dict(interval=1) can you give some suggestions?

forvd avatar May 12 '23 08:05 forvd

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error? image

@zx2624 Hi! I meet the same problem, have you solved it? Thanks.

fishmarch avatar May 24 '23 11:05 fishmarch

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error? image

@zx2624 Hi! I meet the same problem, have you solved it? Thanks.

@fishmarch Hi, I met the same problem, have you solved this? Thanks.

swc-17 avatar Jun 28 '23 02:06 swc-17

@LegendBC Hi, I have added temporal feature like bevformer,the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

We have addressed the temporal issue in the latest MapTRv1 code. The issue is that the can bus provides extra harmful information in the temporal setting. We set the length of can_bus to 6 instead of original 18 here

LegendBC avatar Sep 14 '23 01:09 LegendBC

@zxczrx123 hi,I experimented on my own dataset, The indicators dropped a lot

Hi, I would like to ask you, how do you create your own dataset? Looking forward to your reply! Thanks!!!!!!

colahe avatar Sep 15 '23 20:09 colahe

@zxczrx123 hi,I experimented on my own dataset, The indicators dropped a lot

Hi, I would like to ask you, how do you create your own dataset? Looking forward to your reply! Thanks!!!!!!

Our dataset format is the same as nuscenes。

bitwangdan avatar Sep 18 '23 02:09 bitwangdan

@LegendBC Hi, I have added temporal feature like bevformer,the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

We have addressed the temporal issue in the latest MapTRv1 code. The issue is that the can bus provides extra harmful information in the temporal setting. We set the length of can_bus to 6 instead of original 18 here

Thank you for your reply, but it seems that the temporal was not used when testing.

bitwangdan avatar Sep 18 '23 02:09 bitwangdan

@LegendBC Hi, When using temporal, the Video_test_mode needs to be True. I tried the new version of the temporal method, but the results was still not good.

bitwangdan avatar Sep 24 '23 07:09 bitwangdan

@LegendBC Hi, When using temporal, the Video_test_mode needs to be True. I tried the new version of the temporal method, but the results was still not good.

We set video_test_mode=True when we test, the result is consistent with when it was set False.

zyc10ud avatar Sep 24 '23 13:09 zyc10ud

@LegendBC Hi, I experimented with two temporal methods. The result of GKT encoder is normal, mAP: 52.1, and the result of bevformer encoder is not good, mAP: 25.1 , there may be some problems. By the way, how to integrate lidar with temporal? Looking forward to your reply

bitwangdan avatar Oct 17 '23 01:10 bitwangdan