mmdetection3d MVXNet fusion not work (iamge features can not use)

Help!

When I input diffrent images with same point data from KITTI on MVXNet,The results has no change. I have tried to find some reason:

Is there are some coding error on MVXnet config ? in MVXnet config , I can't see any config about image or fusion.

or In class MVXTwoStageDetector, the function simple_test can;t use image feats because the property with_img_bbox is always fales?

Why?so, what the meaning of image features ? looking forward your response

This is MVXNet config

_base_ = ['../_base_/schedules/cosine.py', '../_base_/default_runtime.py']

# model settings
voxel_size = [0.05, 0.05, 0.1]
point_cloud_range = [0, -40, -3, 70.4, 40, 1]

model = dict(
    type='DynamicMVXFasterRCNN',
    img_backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe'),
    img_neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    pts_voxel_layer=dict(
        max_num_points=-1,
        point_cloud_range=point_cloud_range,
        voxel_size=voxel_size,
        max_voxels=(-1, -1),
    ),
    pts_voxel_encoder=dict(
        type='DynamicVFE',
        in_channels=4,
        feat_channels=[64, 64],
        with_distance=False,
        voxel_size=voxel_size,
        with_cluster_center=True,
        with_voxel_center=True,
        point_cloud_range=point_cloud_range,
        fusion_layer=dict(
            type='PointFusion',
            img_channels=256,
            pts_channels=64,
            mid_channels=128,
            out_channels=128,
            img_levels=[0, 1, 2, 3, 4],
            align_corners=False,
            activate_out=True,
            fuse_out=False)),
    pts_middle_encoder=dict(
        type='SparseEncoder',
        in_channels=128,
        sparse_shape=[41, 1600, 1408],
        order=('conv', 'norm', 'act')),
    pts_backbone=dict(
        type='SECOND',
        in_channels=256,
        layer_nums=[5, 5],
        layer_strides=[1, 2],
        out_channels=[128, 256]),
    pts_neck=dict(
        type='SECONDFPN',
        in_channels=[128, 256],
        upsample_strides=[1, 2],
        out_channels=[256, 256]),
    pts_bbox_head=dict(
        type='Anchor3DHead',
        num_classes=3,
        in_channels=512,
        feat_channels=512,
        use_direction_classifier=True,
        anchor_generator=dict(
            type='Anchor3DRangeGenerator',
            ranges=[
                [0, -40.0, -0.6, 70.4, 40.0, -0.6],
                [0, -40.0, -0.6, 70.4, 40.0, -0.6],
                [0, -40.0, -1.78, 70.4, 40.0, -1.78],
            ],
            sizes=[[0.6, 0.8, 1.73], [0.6, 1.76, 1.73], [1.6, 3.9, 1.56]],
            rotations=[0, 1.57],
            reshape_out=False),
        assigner_per_size=True,
        diff_rad_by_sin=True,
        assign_per_class=True,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
    # model training and testing settings
    train_cfg=dict(
        pts=dict(
            assigner=[
                dict(  # for Pedestrian
                    type='MaxIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.35,
                    neg_iou_thr=0.2,
                    min_pos_iou=0.2,
                    ignore_iof_thr=-1),
                dict(  # for Cyclist
                    type='MaxIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.35,
                    neg_iou_thr=0.2,
                    min_pos_iou=0.2,
                    ignore_iof_thr=-1),
                dict(  # for Car
                    type='MaxIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.45,
                    min_pos_iou=0.45,
                    ignore_iof_thr=-1),
            ],
            allowed_border=0,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        pts=dict(
            use_rotate_nms=True,
            nms_across_levels=False,
            nms_thr=0.01,
            score_thr=0.1,
            min_bbox_size=0,
            nms_pre=100,
            max_num=50)))

as you can see ,There is no config for images .Comapre to imvotenet's Config:

     pts=dict(
            vote_module_cfg=dict(
                in_channels=256,
                vote_per_seed=1,
                gt_per_seed=3,
                conv_channels=(256, 256),
                conv_cfg=dict(type='Conv1d'),
                norm_cfg=dict(type='BN1d'),
                norm_feats=True,
                vote_loss=dict(
                    type='ChamferDistance',
                    mode='l1',
                    reduction='none',
                    loss_dst_weight=10.0)),
            vote_aggregation_cfg=dict(
                type='PointSAModule',
                num_point=256,
                radius=0.3,
                num_sample=16,
                mlp_channels=[256, 128, 128, 128],
                use_xyz=True,
                normalize_xyz=True)),
        img=dict(
            vote_module_cfg=dict(
                in_channels=256,
                vote_per_seed=1,
                gt_per_seed=3,
                conv_channels=(256, 256),
                conv_cfg=dict(type='Conv1d'),
                norm_cfg=dict(type='BN1d'),
                norm_feats=True,
                vote_loss=dict(
                    type='ChamferDistance',
                    mode='l1',
                    reduction='none',
                    loss_dst_weight=10.0)),
            vote_aggregation_cfg=dict(
                type='PointSAModule',
                num_point=256,
                radius=0.3,
                num_sample=16,
                mlp_channels=[256, 128, 128, 128],
                use_xyz=True,
                normalize_xyz=True)),
        loss_weights=[0.4, 0.3, 0.3]),
    img_mlp=dict(
        in_channel=18,
        conv_channels=(256, 256),
        conv_cfg=dict(type='Conv1d'),
        norm_cfg=dict(type='BN1d'),
        act_cfg=dict(type='ReLU')),
    fusion_layer=dict(
        type='VoteFusion',
        num_classes=len(class_names),
        max_imvote_per_pixel=3),

Feb 16 '22 10:02 853108389

if with_img_bbox is true, it means model will output the detection results from image branch. However, MVXNet is not a model which fuses the detection results directly from point cloud branch and image branch (post-process), it uses image feature for fusing the image feature and point future during voxelization. You can see the details in https://github.com/open-mmlab/mmdetection3d/blob/86cc487cca2eb332ad4e6adf2fff8c879ffc2115/mmdet3d/models/detectors/mvx_faster_rcnn.py#L49

Feb 18 '22 07:02 ZCMax

Thank you for your reply. However, I also have a problem.

When I input diffrent images with same point data from KITTI on MVXNet,the results has no change. I have checked the code, the feature from image have chagned,but the result have no change.

How can I get the right results?

(the original results)

(I have chagned the images to this , but have no effect on the result)

Feb 18 '22 07:02 853108389

It's strange, I recommend to check the Intermediate Variable（like the constructed voxel feature), whether they are same or not.

Feb 18 '22 09:02 ZCMax

emmm .... Maybe it's not my problem. There is no need to modify the code, you can easily reproduce my bug by replacing the pictures in the demo with pictures of the same size.

python demo/multi_modality_demo.py demo/data/kitti/kitti_000008.bin demo/data/kitti/kitti_000008.png demo/data/kitti/kitti_000008_infos.pkl configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py checkpoints/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class_20200621_003904-10140f2d.pth

just replace kitti_000008.png with 'kitti_000009.png' or other pictures

https://github.com/open-mmlab/mmdetection3d/blob/master/docs/zh_cn/demo.md

I look forward to your help

Feb 18 '22 09:02 853108389

We will reproduce the problem ASAP.

Feb 18 '22 09:02 ZCMax

yes i encountered the same problem i think the model does not rely on the image feature heavily .

Feb 24 '22 08:02 khaledmohamed00

Sorry, I don't quite understand what you mean. Is this a problem or a feature of the model?

---Original--- From: "Khaled Mohmed @.> Date: Thu, Feb 24, 2022 16:11 PM To: @.>; Cc: @.@.>; Subject: Re: [open-mmlab/mmdetection3d] MVXNet fusion not work (iamge featurescan not use) (Issue #1243)

yes i encountered the same problem i think the model does not rely on the image feature heavily .

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

Feb 24 '22 11:02 853108389

@853108389 I think it is a problem .

Feb 24 '22 15:02 khaledmohamed00

We will reproduce the problem ASAP. Thank you for the fusion model you provided. I hope you can notify me here after you fix this bug

Mar 01 '22 08:03 853108389

Hello, I would like to ask how you got the kitti_000008_infos.pkl file when you run demo.py of mvx-net

Although I know that it is generated based on the annotation information, the official does not seem to give the specific format of this pkl file, or other conversion script etc.

if possible, could you please send me a copy of the pkl file you generated for me , I would be very grateful, this is my qq email: [email protected]. Thank you very much

Mar 28 '22 14:03 Chenfanqing

Hello, I would like to ask how you got the kitti_000008_infos.pkl file when you run demo.py of mvx-net

Although I know that it is generated based on the annotation information, the official does not seem to give the specific format of this pkl file, or other conversion script etc.

if possible, could you please send me a copy of the pkl file you generated for me , I would be very grateful, this is my qq email: [email protected]. Thank you very much

@Chenfanqing , follow this doc to convert kitti to pkl: https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/kitti_det.md

May 20 '22 15:05 mljack

I know it is quite late, but one way to prove this problem is to multiply the input image by zero after it is read by the code and convered to numpy_array

x1 = np.array(input_lidar_points) # shape -1, 4 x2 = np.array(input_pixels_image) # shape -1, 3

--> I suggest doing the following x2 = x2 * 0

this way the input image is an array of zeros

and do another experiment with lidar points array as zeros x1 = x1 * 0

I think this will clear out this issue

Jan 12 '23 10:01 mhmodayman

@Tai-Wang, @VVsssssk, anyway is this issue solved?

Jan 12 '23 10:01 mhmodayman

mmdetection3d mmdetection3d copied to clipboard

MVXNet fusion not work (iamge features can not use)

mmdetection3d
mmdetection3d copied to clipboard