mmdetection3d
mmdetection3d copied to clipboard
MVXNet fusion not work (iamge features can not use)
Help!
When I input diffrent images with same point data from KITTI on MVXNet,The results has no change. I have tried to find some reason:
Is there are some coding error on MVXnet config ? in MVXnet config , I can't see any config about image or fusion.
or
In class MVXTwoStageDetector,
the function simple_test can;t use image feats because the property with_img_bbox
is always fales?
Why?so, what the meaning of image features ? looking forward your response
This is MVXNet config
_base_ = ['../_base_/schedules/cosine.py', '../_base_/default_runtime.py']
# model settings
voxel_size = [0.05, 0.05, 0.1]
point_cloud_range = [0, -40, -3, 70.4, 40, 1]
model = dict(
type='DynamicMVXFasterRCNN',
img_backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='caffe'),
img_neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
pts_voxel_layer=dict(
max_num_points=-1,
point_cloud_range=point_cloud_range,
voxel_size=voxel_size,
max_voxels=(-1, -1),
),
pts_voxel_encoder=dict(
type='DynamicVFE',
in_channels=4,
feat_channels=[64, 64],
with_distance=False,
voxel_size=voxel_size,
with_cluster_center=True,
with_voxel_center=True,
point_cloud_range=point_cloud_range,
fusion_layer=dict(
type='PointFusion',
img_channels=256,
pts_channels=64,
mid_channels=128,
out_channels=128,
img_levels=[0, 1, 2, 3, 4],
align_corners=False,
activate_out=True,
fuse_out=False)),
pts_middle_encoder=dict(
type='SparseEncoder',
in_channels=128,
sparse_shape=[41, 1600, 1408],
order=('conv', 'norm', 'act')),
pts_backbone=dict(
type='SECOND',
in_channels=256,
layer_nums=[5, 5],
layer_strides=[1, 2],
out_channels=[128, 256]),
pts_neck=dict(
type='SECONDFPN',
in_channels=[128, 256],
upsample_strides=[1, 2],
out_channels=[256, 256]),
pts_bbox_head=dict(
type='Anchor3DHead',
num_classes=3,
in_channels=512,
feat_channels=512,
use_direction_classifier=True,
anchor_generator=dict(
type='Anchor3DRangeGenerator',
ranges=[
[0, -40.0, -0.6, 70.4, 40.0, -0.6],
[0, -40.0, -0.6, 70.4, 40.0, -0.6],
[0, -40.0, -1.78, 70.4, 40.0, -1.78],
],
sizes=[[0.6, 0.8, 1.73], [0.6, 1.76, 1.73], [1.6, 3.9, 1.56]],
rotations=[0, 1.57],
reshape_out=False),
assigner_per_size=True,
diff_rad_by_sin=True,
assign_per_class=True,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
loss_dir=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
# model training and testing settings
train_cfg=dict(
pts=dict(
assigner=[
dict( # for Pedestrian
type='MaxIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.35,
neg_iou_thr=0.2,
min_pos_iou=0.2,
ignore_iof_thr=-1),
dict( # for Cyclist
type='MaxIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.35,
neg_iou_thr=0.2,
min_pos_iou=0.2,
ignore_iof_thr=-1),
dict( # for Car
type='MaxIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.45,
min_pos_iou=0.45,
ignore_iof_thr=-1),
],
allowed_border=0,
pos_weight=-1,
debug=False)),
test_cfg=dict(
pts=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_thr=0.01,
score_thr=0.1,
min_bbox_size=0,
nms_pre=100,
max_num=50)))
as you can see ,There is no config for images .Comapre to imvotenet's Config:
pts=dict(
vote_module_cfg=dict(
in_channels=256,
vote_per_seed=1,
gt_per_seed=3,
conv_channels=(256, 256),
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d'),
norm_feats=True,
vote_loss=dict(
type='ChamferDistance',
mode='l1',
reduction='none',
loss_dst_weight=10.0)),
vote_aggregation_cfg=dict(
type='PointSAModule',
num_point=256,
radius=0.3,
num_sample=16,
mlp_channels=[256, 128, 128, 128],
use_xyz=True,
normalize_xyz=True)),
img=dict(
vote_module_cfg=dict(
in_channels=256,
vote_per_seed=1,
gt_per_seed=3,
conv_channels=(256, 256),
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d'),
norm_feats=True,
vote_loss=dict(
type='ChamferDistance',
mode='l1',
reduction='none',
loss_dst_weight=10.0)),
vote_aggregation_cfg=dict(
type='PointSAModule',
num_point=256,
radius=0.3,
num_sample=16,
mlp_channels=[256, 128, 128, 128],
use_xyz=True,
normalize_xyz=True)),
loss_weights=[0.4, 0.3, 0.3]),
img_mlp=dict(
in_channel=18,
conv_channels=(256, 256),
conv_cfg=dict(type='Conv1d'),
norm_cfg=dict(type='BN1d'),
act_cfg=dict(type='ReLU')),
fusion_layer=dict(
type='VoteFusion',
num_classes=len(class_names),
max_imvote_per_pixel=3),
if with_img_bbox
is true, it means model will output the detection results from image branch. However, MVXNet is not a model which fuses the detection results directly from point cloud branch and image branch (post-process), it uses image feature for fusing the image feature and point future during voxelization. You can see the details in https://github.com/open-mmlab/mmdetection3d/blob/86cc487cca2eb332ad4e6adf2fff8c879ffc2115/mmdet3d/models/detectors/mvx_faster_rcnn.py#L49
Thank you for your reply. However, I also have a problem.
When I input diffrent images with same point data from KITTI on MVXNet,the results has no change. I have checked the code, the feature from image have chagned,but the result have no change.
How can I get the right results?
(the original results)
(I have chagned the images to this , but have no effect on the result)
It's strange, I recommend to check the Intermediate Variable(like the constructed voxel feature), whether they are same or not.
emmm .... Maybe it's not my problem. There is no need to modify the code, you can easily reproduce my bug by replacing the pictures in the demo with pictures of the same size.
python demo/multi_modality_demo.py demo/data/kitti/kitti_000008.bin demo/data/kitti/kitti_000008.png demo/data/kitti/kitti_000008_infos.pkl configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py checkpoints/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class_20200621_003904-10140f2d.pth
just replace kitti_000008.png
with 'kitti_000009.png' or other pictures
https://github.com/open-mmlab/mmdetection3d/blob/master/docs/zh_cn/demo.md
I look forward to your help
We will reproduce the problem ASAP.
yes i encountered the same problem i think the model does not rely on the image feature heavily .
Sorry, I don't quite understand what you mean. Is this a problem or a feature of the model?
---Original--- From: "Khaled Mohmed @.> Date: Thu, Feb 24, 2022 16:11 PM To: @.>; Cc: @.@.>; Subject: Re: [open-mmlab/mmdetection3d] MVXNet fusion not work (iamge featurescan not use) (Issue #1243)
yes i encountered the same problem i think the model does not rely on the image feature heavily .
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
@853108389 I think it is a problem .
We will reproduce the problem ASAP. Thank you for the fusion model you provided. I hope you can notify me here after you fix this bug
Hello, I would like to ask how you got the kitti_000008_infos.pkl file when you run demo.py of mvx-net
Although I know that it is generated based on the annotation information, the official does not seem to give the specific format of this pkl file, or other conversion script etc.
if possible, could you please send me a copy of the pkl file you generated for me , I would be very grateful, this is my qq email: [email protected]. Thank you very much
Hello, I would like to ask how you got the kitti_000008_infos.pkl file when you run demo.py of mvx-net
Although I know that it is generated based on the annotation information, the official does not seem to give the specific format of this pkl file, or other conversion script etc.
if possible, could you please send me a copy of the pkl file you generated for me , I would be very grateful, this is my qq email: [email protected]. Thank you very much
@Chenfanqing , follow this doc to convert kitti to pkl: https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/kitti_det.md
I know it is quite late, but one way to prove this problem is to multiply the input image by zero after it is read by the code and convered to numpy_array
x1 = np.array(input_lidar_points) # shape -1, 4 x2 = np.array(input_pixels_image) # shape -1, 3
--> I suggest doing the following x2 = x2 * 0
this way the input image is an array of zeros
and do another experiment with lidar points array as zeros x1 = x1 * 0
I think this will clear out this issue
@Tai-Wang, @VVsssssk, anyway is this issue solved?