hi , thanks for your great work! I want to increase the range, [-15.0, -30.0, -2.0, 60.0, 30.0, 2.0], I modified "MapTRNMSFreeCoder" and point_cloud_range, but the result is not very good, can you give some suggestions?

Feb 28 '23 09:02 bitwangdan

@bitwangdan Hi, I had the same problem before. There is a bug when you use asymmetric range, because the patch_size of LocalMap is (w/2, h/2), but the origin is still (0, 0). You can check the code below.

https://github.com/hustvl/MapTR/blob/3b0c7d6b634193657023ebf755f6628b08100806/projects/mmdet3d_plugin/datasets/nuscenes_map_dataset.py#L85

https://github.com/hustvl/MapTR/blob/3b0c7d6b634193657023ebf755f6628b08100806/projects/mmdet3d_plugin/datasets/nuscenes_map_dataset.py#L828

Mar 06 '23 08:03 zxczrx123

@zxczrx123 thanks，I have found this problem, and there is one more point to pay attention to, the parameter num_vec needs to be increased

Mar 06 '23 08:03 bitwangdan

@bitwangdan By the way, the increase in range greatly increases the complexity of the instance, and the method with fixed num vecs seems to be difficult to deal with lines of different lengths.

Mar 06 '23 08:03 zxczrx123

@zxczrx123 Yes, when I increase the data range, many category indicators have dropped significantly. I have not found a better way except to increase the parameter num_vec。

Mar 06 '23 09:03 bitwangdan

@zxczrx123 Hi, have you tried adding temporal features like bevformer？the result of my experiment is not very good

Mar 08 '23 01:03 bitwangdan

@LegendBC Hi, I have added temporal feature like bevformer，the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

Mar 09 '23 06:03 bitwangdan

@LegendBC Hi, I have added temporal feature like bevformer，the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

Mar 10 '23 01:03 LegendBC

@LegendBC Thank you for your reply, I have added the lidar information, and the mAP in my dataset have improved a lot，when i add temporal information like bevformer, the mAP drops a lot, maybe this temporal fusion method is not suitable for MapTR, I will try other temporal methods and also hope that you can find a suitable temporal method for MapTR

Mar 10 '23 01:03 bitwangdan

@zxczrx123 Hi, have you tried adding temporal features like bevformer？the result of my experiment is not very good

I have not used temporal features. Can you share your results?

Mar 10 '23 13:03 zxczrx123

@zxczrx123 hi，I experimented on my own dataset, The indicators dropped a lot

Mar 13 '23 06:03 bitwangdan

@zxczrx123 hi, have you ever encountered such a situation? After increasing the point_cloud_range, mAP drops a lot under the threshold 0.5, but the threshold of 1.0 and 1.5 is basically normal.

Mar 23 '23 00:03 bitwangdan

@bitwangdan My phenomenon is that it will all drop.

Apr 12 '23 02:04 zxczrx123

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error?

Apr 26 '23 02:04 zx2624

@zx2624 Debug: TORCH_DISTRIBUTED_DEBUG=DETAIL bash tools/dist_train.sh **config I adjusted this parameter (rotate_center), but still wrong result

Apr 26 '23 02:04 bitwangdan

@zx2624 Debug: TORCH_DISTRIBUTED_DEBUG=DETAIL bash tools/dist_train.sh **config I adjusted this parameter (rotate_center), but still wrong result

any code？

Apr 26 '23 08:04 zx2624

same problem. here is my cfg base = [ '../datasets/custom_nus-3d.py', '../base/default_runtime.py' ]

plugin = True plugin_dir = 'projects/mmdet3d_plugin/'

If point cloud range is changed, the models should also change their point

cloud range accordingly

point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]

point_cloud_range = [-15.0, -60.0, -2.0, 15.0, 60.0, 2.0] voxel_size = [0.15, 0.15, 4]

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

For nuScenes we usually do 10-class detection

class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]

map has classes: divider, ped_crossing, boundary

map_classes = ['divider', 'ped_crossing','boundary']

fixed_ptsnum_per_line = 20

map_classes = ['divider',]

fixed_ptsnum_per_gt_line = 40 # now only support fixed_pts > 0 fixed_ptsnum_per_pred_line = 40 eval_use_same_gt_sample_num_flag=True num_map_classes = len(map_classes)

input_modality = dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=True)

dim = 256 pos_dim = dim//2 ffn_dim = dim*2 num_levels = 1

bev_h_ = 50

bev_w_ = 50

bev_h_ = 400 bev_w_ = 100 queue_length = 1 # each sequence contains queue_length frames.

model = dict( type='MapTR', use_grid_mask=True, video_test_mode=False, pretrained=dict(img='ckpts/resnet50-19c8e357.pth'), img_backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(3,), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='pytorch'), img_neck=dict( type='FPN', in_channels=[2048], out_channels=dim, start_level=0, add_extra_convs='on_output', num_outs=num_levels, relu_before_extra_convs=True), pts_bbox_head=dict( type='MapTRHead', bev_h=bev_h_, bev_w=bev_w_, num_query=900, num_vec=100, num_pts_per_vec=fixed_ptsnum_per_pred_line, # one bbox num_pts_per_gt_vec=fixed_ptsnum_per_gt_line, dir_interval=1, query_embed_type='instance_pts', transform_method='minmax', gt_shift_pts_pattern='v2', num_classes=num_map_classes, in_channels=dim, sync_cls_avg_factor=True, with_box_refine=True, as_two_stage=False, code_size=2, code_weights=[1.0, 1.0, 1.0, 1.0], transformer=dict( type='MapTRPerceptionTransformer', rotate_prev_bev=True, use_shift=True, use_can_bus=True, embed_dims=dim, encoder=dict( type='BEVFormerEncoder', num_layers=1, pc_range=point_cloud_range, num_points_in_pillar=4, return_intermediate=False, transformerlayers=dict( type='BEVFormerLayer', attn_cfgs=[ dict( type='TemporalSelfAttention', embed_dims=dim, num_levels=1), dict( type='GeometrySptialCrossAttention', pc_range=point_cloud_range, attention=dict( type='GeometryKernelAttention', embed_dims=dim, num_heads=4, dilation=1, kernel_size=(3,5), num_levels=num_levels), embed_dims=dim, ) ], feedforward_channels=ffn_dim, ffn_dropout=0.1, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm'))), decoder=dict( type='MapTRDecoder', num_layers=6, return_intermediate=True, transformerlayers=dict( type='DetrTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=dim, num_heads=8, dropout=0.1), dict( type='CustomMSDeformableAttention', embed_dims=dim, num_levels=1), ],

                feedforward_channels=_ffn_dim_,
                ffn_dropout=0.1,
                operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
                                 'ffn', 'norm')))),
    bbox_coder=dict(
        type='MapTRNMSFreeCoder',
        # post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
        post_center_range=[-20, -65, -20, -65, 20, 65, 20, 65],
        pc_range=point_cloud_range,
        max_num=50,
        voxel_size=voxel_size,
        num_classes=num_map_classes),
    positional_encoding=dict(
        type='LearnedPositionalEncoding',
        num_feats=_pos_dim_,
        row_num_embed=bev_h_,
        col_num_embed=bev_w_,
        ),
    loss_cls=dict(
        type='FocalLoss',
        use_sigmoid=True,
        gamma=2.0,
        alpha=0.25,
        loss_weight=2.0),
    loss_bbox=dict(type='L1Loss', loss_weight=0.0),
    loss_iou=dict(type='GIoULoss', loss_weight=0.0),
    loss_pts=dict(type='PtsL1Loss', 
                  loss_weight=5.0),
    loss_dir=dict(type='PtsDirCosLoss', loss_weight=0.005)),
# model training and testing settings
train_cfg=dict(pts=dict(
    grid_size=[512, 512, 1],
    voxel_size=voxel_size,
    point_cloud_range=point_cloud_range,
    out_size_factor=4,
    assigner=dict(
        type='MapTRAssigner',
        cls_cost=dict(type='FocalLossCost', weight=2.0),
        reg_cost=dict(type='BBoxL1Cost', weight=0.0, box_format='xywh'),
        # reg_cost=dict(type='BBox3DL1Cost', weight=0.25),
        # iou_cost=dict(type='IoUCost', weight=1.0), # Fake cost. This is just to make it compatible with DETR head.
        iou_cost=dict(type='IoUCost', iou_mode='giou', weight=0.0),
        pts_cost=dict(type='OrderedPtsL1Cost', 
                  weight=5),
        pc_range=point_cloud_range))))

dataset_type = 'CustomNuScenesLocalMapDataset' data_root = 'data/nuscenes/' file_client_args = dict(backend='disk')

train_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='PhotoMetricDistortionMultiViewImage'), dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectNameFilter', classes=class_names), dict(type='NormalizeMultiviewImage', **img_norm_cfg), dict(type='RandomScaleImageMultiViewImage', scales=[0.5]), dict(type='PadMultiViewImage', size_divisor=32), dict(type='DefaultFormatBundle3D', class_names=class_names), dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img']) ]

test_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict(type='NormalizeMultiviewImage', **img_norm_cfg),

dict(
    type='MultiScaleFlipAug3D',
    img_scale=(1600, 900),
    pts_scale_ratio=1,
    flip=False,
    transforms=[
        dict(type='RandomScaleImageMultiViewImage', scales=[0.5]),
        dict(type='PadMultiViewImage', size_divisor=32),
        dict(
            type='DefaultFormatBundle3D',
            class_names=class_names,
            with_label=False),
        dict(type='CustomCollect3D', keys=['img'])
    ])

]

data = dict( samples_per_gpu=2, workers_per_gpu=4, train=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'nuscenes_infos_temporal_train.pkl', pipeline=train_pipeline, classes=class_names, modality=input_modality, test_mode=False, use_valid_flag=True, bev_size=(bev_h_, bev_w_), pc_range=point_cloud_range, fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line, eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag, padding_value=-10000, map_classes=map_classes, queue_length=queue_length, # we use box_type_3d='LiDAR' in kitti and nuscenes dataset # and box_type_3d='Depth' in sunrgbd and scannet dataset. box_type_3d='LiDAR'), val=dict(type=dataset_type, data_root=data_root, ann_file=data_root + 'nuscenes_infos_temporal_val.pkl', map_ann_file=data_root + 'nuscenes_map_anns_val.json', pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), pc_range=point_cloud_range, fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line, eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag, padding_value=-10000, map_classes=map_classes, classes=class_names, modality=input_modality, samples_per_gpu=1), test=dict(type=dataset_type, data_root=data_root, ann_file=data_root + 'nuscenes_infos_temporal_val.pkl', map_ann_file=data_root + 'nuscenes_map_anns_val.json', pipeline=test_pipeline, bev_size=(bev_h_, bev_w_), pc_range=point_cloud_range, fixed_ptsnum_per_line=fixed_ptsnum_per_gt_line, eval_use_same_gt_sample_num_flag=eval_use_same_gt_sample_num_flag, padding_value=-10000, map_classes=map_classes, classes=class_names, modality=input_modality), shuffler_sampler=dict(type='DistributedGroupSampler'), nonshuffler_sampler=dict(type='DistributedSampler') )

optimizer = dict( type='AdamW', lr=6e-4, paramwise_cfg=dict( custom_keys={ 'img_backbone': dict(lr_mult=0.1), }), weight_decay=0.01)

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, min_lr_ratio=1e-3) total_epochs = 24

total_epochs = 50

evaluation = dict(interval=1, pipeline=test_pipeline)

evaluation = dict(interval=2, pipeline=test_pipeline, metric='chamfer')

runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)

log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook') ]) fp16 = dict(loss_scale=512.) checkpoint_config = dict(interval=1) can you give some suggestions?

May 12 '23 08:05 forvd

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error?

@zx2624 Hi! I meet the same problem, have you solved it? Thanks.

May 24 '23 11:05 fishmarch

Probably because the rotate-center used in bevformer is not at (0, 0). Any way, when adding tempral fusion(length-que = 3), I have a runtime error, have you got the same error?

@zx2624 Hi! I meet the same problem, have you solved it? Thanks.

@fishmarch Hi, I met the same problem, have you solved this? Thanks.

Jun 28 '23 02:06 swc-17

@LegendBC Hi, I have added temporal feature like bevformer，the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

We have addressed the temporal issue in the latest MapTRv1 code. The issue is that the can bus provides extra harmful information in the temporal setting. We set the length of can_bus to 6 instead of original 18 here

Sep 14 '23 01:09 LegendBC

@zxczrx123 hi，I experimented on my own dataset, The indicators dropped a lot

Hi, I would like to ask you, how do you create your own dataset? Looking forward to your reply! Thanks!!!!!!

Sep 15 '23 20:09 colahe

@zxczrx123 hi，I experimented on my own dataset, The indicators dropped a lot

Hi, I would like to ask you, how do you create your own dataset? Looking forward to your reply! Thanks!!!!!!

Our dataset format is the same as nuscenes。

Sep 18 '23 02:09 bitwangdan

@LegendBC Hi, I have added temporal feature like bevformer，the result of my experiment is not very good, are you experimenting with temporal feature in you your code?

We have tried the temporal fusion for MapTR and found that it deteriorated the accuracy, so we removed it.

We have addressed the temporal issue in the latest MapTRv1 code. The issue is that the can bus provides extra harmful information in the temporal setting. We set the length of can_bus to 6 instead of original 18 here

Thank you for your reply, but it seems that the temporal was not used when testing.

Sep 18 '23 02:09 bitwangdan

@LegendBC Hi, When using temporal, the Video_test_mode needs to be True. I tried the new version of the temporal method, but the results was still not good.

Sep 24 '23 07:09 bitwangdan

@LegendBC Hi, When using temporal, the Video_test_mode needs to be True. I tried the new version of the temporal method, but the results was still not good.

We set video_test_mode=True when we test, the result is consistent with when it was set False.

Sep 24 '23 13:09 zyc10ud

@LegendBC Hi， I experimented with two temporal methods. The result of GKT encoder is normal, mAP: 52.1, and the result of bevformer encoder is not good, mAP: 25.1 , there may be some problems. By the way, how to integrate lidar with temporal? Looking forward to your reply

Oct 17 '23 01:10 bitwangdan

MapTR
MapTR copied to clipboard

increase point_cloud_range

If point cloud range is changed, the models should also change their point

cloud range accordingly

point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]

For nuScenes we usually do 10-class detection

map has classes: divider, ped_crossing, boundary

fixed_ptsnum_per_line = 20

map_classes = ['divider',]

bev_h_ = 50

bev_w_ = 50

learning policy

total_epochs = 50

evaluation = dict(interval=1, pipeline=test_pipeline)

MapTR MapTR copied to clipboard

increase point_cloud_range

If point cloud range is changed, the models should also change their point

cloud range accordingly

point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]

For nuScenes we usually do 10-class detection

map has classes: divider, ped_crossing, boundary

fixed_ptsnum_per_line = 20

map_classes = ['divider',]

bev_h_ = 50

bev_w_ = 50

learning policy

total_epochs = 50

evaluation = dict(interval=1, pipeline=test_pipeline)

MapTR
MapTR copied to clipboard