mmdetection3d icon indicating copy to clipboard operation
mmdetection3d copied to clipboard

Is it possible to provide the config for centerpoint to train on waymo dataset?

Open ZZY816 opened this issue 2 years ago • 6 comments

After referring to the official codes of OpenPCDet and CenterPoint, I wrote a CenterPoint model config trained on Waymo. But the strange thing is that the centerpoint-waymo model I trained on mmdet3D has poor performance. Can someone help me? Thanks!

Here is my config.

model config

voxel_size = [0.32, 0.32, 6]
model = dict(
    type='CenterPoint',
    pts_voxel_layer=dict(
        max_num_points=20, voxel_size=voxel_size, max_voxels=(32000, 32000)),
    pts_voxel_encoder=dict(
        type='PillarFeatureNet',
        in_channels=5,
        feat_channels=[64],
        with_distance=False,
        voxel_size=(0.32, 0.32, 6),
        norm_cfg=dict(type='BN1d', eps=1e-3, momentum=0.01),
        legacy=False),
    pts_middle_encoder=dict(
        type='PointPillarsScatter', in_channels=64, output_shape=(512, 512)),
    pts_backbone=dict(
        type='SECOND',
        in_channels=64,  # Notice change for multiframe
        out_channels=[64, 128, 256],
        layer_nums=[3, 5, 5],
        layer_strides=[1, 2, 2],
        norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01),
        conv_cfg=dict(type='Conv2d', bias=False)),
    pts_neck=dict(
        type='SECONDFPN',
        in_channels=[64, 128, 256],
        out_channels=[128, 128, 128],
        upsample_strides=[1, 2, 4],
        norm_cfg=dict(type='BN', eps=1e-3, momentum=0.01),
        upsample_cfg=dict(type='deconv', bias=False),
        use_conv_for_no_stride=True),
    pts_bbox_head=dict(
        type='CenterHead',
        in_channels=sum([128, 128, 128]),  
        #in_channels=sum([128, 128, 128, 128, 128, 128]),
        tasks = [
            dict(num_class=2, class_names=['Car', 'Pedestrian']),
        ],
        common_heads=dict(
            reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2)),
        share_conv_channel=64,
        bbox_coder=dict(
            type='CenterPointBBoxCoder',
            post_center_range=[-74.88, -74.88, -2, 74.88, 74.88, 4.0],
            max_num=500,
            score_threshold=0.1,
            out_size_factor=1,
            voxel_size=voxel_size[:2],
            code_size=7),
        separate_head=dict(
            type='SeparateHead', init_bias=-2.19, final_kernel=3),
        loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
        loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=2),
        norm_bbox=True),
    # model training and testing settings
    train_cfg=dict(
        pts=dict(
            grid_size=[512, 512, 1],
            voxel_size=voxel_size,
            out_size_factor=1,
            dense_reg=1,
            gaussian_overlap=0.1,
            max_objs=500,
            min_radius=2,
            code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])),
    test_cfg=dict(
        pts=dict(
            post_center_limit_range=[-80, -80, -10.0, 80, 80, 10.0],
            max_per_img=500,
            max_pool_nms=False,
            min_radius=[4, 12, 10, 1, 0.85, 0.175],
            score_threshold=0.1,
            pc_range=[-74.88, -74.88],
            out_size_factor=1,
            voxel_size=voxel_size[:2],
            nms_type='rotate',
            pre_max_size=4096,
            post_max_size=500,
            nms_thr=0.7)))

dataset config

data_root = ''
file_client_args = dict(backend='disk')

class_names = ['Car', 'Pedestrian']
point_cloud_range = [-74.88, -74.88, -2, 74.88, 74.88, 4]
input_modality = dict(use_lidar=True, use_camera=False)
db_sampler = dict(
    data_root=data_root,
    info_path=data_root + 'waymo_dbinfos_train.pkl',
    rate=1.0,
    prepare=dict(filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5, Pedestrian=10)),
    classes=class_names,
    sample_groups=dict(Car=15, Pedestrian=10),
    points_loader=dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=6,
        use_dim=[0, 1, 2, 3, 4],
        file_client_args=file_client_args))

train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=6,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='LoadAnnotations3D',
        with_bbox_3d=True,
        with_label_3d=True,
        with_visibility=False,
        file_client_args=file_client_args),
    dict(type='ObjectSample', db_sampler=db_sampler),
    dict(
        type='RandomFlip3D',
        sync_2d=False,
        flip_ratio_bev_horizontal=0.5,
        flip_ratio_bev_vertical=0.5),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.78539816, 0.78539816],
        scale_ratio_range=[0.95, 1.05]),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='PointShuffle'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=6,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1., 1.],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]

eval_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=6,
        use_dim=5,
        file_client_args=file_client_args),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['points'])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='RepeatDataset',
        times=1,
        dataset=dict(
            type=dataset_type,
            data_root=data_root,
            ann_file=data_root + 'waymo_infos_train.pkl',
            split='training',
            pipeline=train_pipeline,
            modality=input_modality,
            classes=class_names,
            test_mode=False,
            # we use box_type_3d='LiDAR' in kitti and nuscenes dataset
            # and box_type_3d='Depth' in sunrgbd and scannet dataset.
            box_type_3d='LiDAR',
            # load one frame every five frames
            load_interval=5)),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'waymo_infos_val.pkl',
        split='training',
        pipeline=test_pipeline,
        modality=input_modality,
        classes=class_names,
        test_mode=True,
        box_type_3d='LiDAR'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'waymo_infos_val.pkl',
        split='training',
        pipeline=test_pipeline,
        modality=input_modality,
        classes=class_names,
        test_mode=True,
        box_type_3d='LiDAR'))

evaluation = dict(interval=36, pipeline=eval_pipeline)

optimizer config

optimizer = dict(type='Adam', betas=(0.9, 0.99), amsgrad=0.0)

optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
    policy='OneCycle',
    max_lr=0.003,
    div_factor=10.0, pct_start=0.4,
)

runner = dict(type='EpochBasedRunner', max_epochs=36)

final config

_base_ = [
    '../_base_/models/centerpoint_02pillar_second_secfpn_waymo_2cls.py',
    '../_base_/datasets/waymoD5-3d-2cls.py',
    '../_base_/schedules/cyclic_30e.py',
    '../_base_/default_runtime.py',
]

point_cloud_range = [-74.88, -74.88, -2, 74.88, 74.88, 4.0]
model = dict(
    pts_voxel_layer=dict(point_cloud_range=point_cloud_range),
    pts_voxel_encoder=dict(point_cloud_range=point_cloud_range),
    pts_bbox_head=dict(bbox_coder=dict(pc_range=point_cloud_range[:2])),
    # model training and testing settings
    train_cfg=dict(pts=dict(point_cloud_range=point_cloud_range)),
    test_cfg=dict(pts=dict(pc_range=point_cloud_range[:2])))

ZZY816 avatar Jul 13 '22 11:07 ZZY816

maybe try

voxel_size = [0.2, 0.2, 6]
tasks = [
    dict(num_class=1, class_names=['Car']),
    dict(num_class=1, class_names=['Pedestrian']),
],
train=dict(
    type='RepeatDataset',
    times=2)

Tartisan avatar Jul 13 '22 12:07 Tartisan

Thank you very much and I will try your suggestion! It is very reasonable to reduce the voxel size and set two heads to improve performance. Meanwhile, I still wonder why my config leads to very poor performance (0.3-0.5 AP), which is far from the official performances. Note that my config is very similar with the official centerpoint config. Also, I can be sure that there is no problem with my data and evaluation. Because I successfully trained the pointpillars model on waymo and achieved the expected performance.

ZZY816 avatar Jul 13 '22 13:07 ZZY816

Hi @ZZY816,

Have you tried the new config and what are the new results? I would appreciate it if you could help answer this question! 👍🏼

RunpeiDong avatar Jul 29 '22 05:07 RunpeiDong

@RunpeiDong After weeks of checking, I finally found out the reason. The poor model performance is mainly caused by the intensity of the waymo data. The intensity of waymo ranges from 0-40000 and it should be normalized. Adding the following codes to class ``LoadPointsFromFile(object)'' (line 425 in loading.py) can solve the problem.

points[:, 3] = np.tanh(points[:, 3])

Meanwhile, the output_shape and grid size in my config are not correct. They should be (468, 468) and [468, 468, 1] rather and (512, 512) and [512, 512, 1]

ZZY816 avatar Jul 29 '22 12:07 ZZY816

Hi @ZZY816, Thanks very much for your hard work and valuable answer. Great job!

RunpeiDong avatar Jul 29 '22 17:07 RunpeiDong

the out_size_factor is need set 8, not 1?

xpyqiubai avatar Sep 01 '22 07:09 xpyqiubai

@ZZY816 Hello! Thank you for your valuable comments. Could you tell me what the performance is after you normalize the intensity and correct the config?

rkotimi avatar Nov 22 '22 04:11 rkotimi

does anyone have the model weights for the waymo dataset? Is it possible to share?

AV-adrian avatar Jun 26 '23 21:06 AV-adrian