bevfusion
bevfusion copied to clipboard
The training results are relatively low.
I use the newest code to train.
The training performance is far lower than the official mAP of 68.52 and NDS of 71.38. May I ask if anyone has encountered this problem?
Results writes to /tmp/tmp3hvpjhz0/results/results_nusc.json
mAP: 0.5926
mATE: 0.3320
mASE: 0.2648
mAOE: 0.4380
mAVE: 0.5936
mAAE: 0.2401
NDS: 0.6094
Eval time: 100.1s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.817 0.199 0.157 0.071 0.547 0.223
truck 0.539 0.380 0.205 0.078 0.585 0.268
bus 0.567 0.424 0.208 0.123 1.185 0.480
trailer 0.385 0.584 0.209 0.723 0.410 0.125
construction_vehicle 0.267 0.730 0.448 0.854 0.130 0.303
pedestrian 0.707 0.242 0.301 0.968 0.629 0.330
motorcycle 0.617 0.230 0.257 0.332 0.963 0.157
bicycle 0.552 0.187 0.272 0.727 0.301 0.033
traffic_cone 0.754 0.152 0.314 nan nan nan
barrier 0.721 0.192 0.277 0.066 nan nan
Same. How do you solve the problem?
i met the same problem when train bevfusion , could you please teel me how to solve it?
The para sweeps_num of loadpointsfrommultisweeps function is set to 0 in the config file of the new code, whereas it was set to sweeps 9 in the original code. I think this may be causing the problem, so you can try it
update: I have tried the above methods and the problem has been solved
i met the same problem when train bevfusion , could you please teel me how to solve it?
The para sweeps_num of loadpointsfrommultisweeps function is set to 0 in the config file of the new code, whereas it was set to sweeps 9 in the original code. I think this may be causing the problem, so you can try it
update: I have tried the above methods and the problem has been solved
Thank you very much for your valuable suggestion!
I use the newest code and change sweeps_num = 0->9 at line 33 in bevfusion/configs/nuscenes/det/default.yaml to train. The training performance is very close to the officially announced mAP of 68.52 and NDS of 71.38. The training results are as follows:
Results writes to /tmp/tmpkpx7_6_a/results/results_nusc.json
mAP: 0.6805
mATE: 0.2860
mASE: 0.2535
mAOE: 0.3113
mAVE: 0.2557
mAAE: 0.1878
NDS: 0.7108
Eval time: 96.7s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.884 0.170 0.148 0.060 0.274 0.186
truck 0.636 0.321 0.181 0.087 0.247 0.221
bus 0.742 0.336 0.187 0.062 0.437 0.272
trailer 0.430 0.532 0.206 0.571 0.216 0.136
construction_vehicle 0.295 0.724 0.428 0.873 0.117 0.295
pedestrian 0.877 0.132 0.283 0.397 0.216 0.103
motorcycle 0.780 0.186 0.248 0.228 0.350 0.277
bicycle 0.639 0.163 0.255 0.470 0.188 0.012
traffic_cone 0.792 0.119 0.320 nan nan nan
barrier 0.729 0.176 0.277 0.054 nan nan
Hi @huzhihen, you're using the newest code, which mean it is BevFusion-R (with extra radar information), right?
感谢您的来信,已收到
Hi @huzhihen, you're using the newest code, which mean it is BevFusion-R (with extra radar information), right?
Yes, you are right.
Hi @huzhihen, you're using the newest code, which mean it is BevFusion-R (with extra radar information), right?
Yes, you are right.
I see, do you have any results of the Lidar + Camera model only? I've been trying to reproduce that model but couldn't get the reported result. Thank you in advance :)
Hi @huzhihen,do you have any results of the Radar + Camera model?
感谢您的来信,已收到
i met the same problem when train bevfusion , could you please teel me how to solve it? The para sweeps_num of loadpointsfrommultisweeps function is set to 0 in the config file of the new code, whereas it was set to sweeps 9 in the original code. I think this may be causing the problem, so you can try it update: I have tried the above methods and the problem has been solved
Thank you very much for your valuable suggestion!
I use the newest code and change sweeps_num = 0->9 at line 33 in bevfusion/configs/nuscenes/det/default.yaml to train. The training performance is very close to the officially announced mAP of 68.52 and NDS of 71.38. The training results are as follows:
Results writes to /tmp/tmpkpx7_6_a/results/results_nusc.json mAP: 0.6805 mATE: 0.2860 mASE: 0.2535 mAOE: 0.3113 mAVE: 0.2557 mAAE: 0.1878 NDS: 0.7108 Eval time: 96.7s Per-class results: Object Class AP ATE ASE AOE AVE AAE car 0.884 0.170 0.148 0.060 0.274 0.186 truck 0.636 0.321 0.181 0.087 0.247 0.221 bus 0.742 0.336 0.187 0.062 0.437 0.272 trailer 0.430 0.532 0.206 0.571 0.216 0.136 construction_vehicle 0.295 0.724 0.428 0.873 0.117 0.295 pedestrian 0.877 0.132 0.283 0.397 0.216 0.103 motorcycle 0.780 0.186 0.248 0.228 0.350 0.277 bicycle 0.639 0.163 0.255 0.470 0.188 0.012 traffic_cone 0.792 0.119 0.320 nan nan nan barrier 0.729 0.176 0.277 0.054 nan nan
I also changed the sweep 0 -> 9, but the performance is still lower that the original results, To check the difference of training detail, could you share the training log?
感谢您的来信,已收到
far lower than the official mAP of 68.52 and NDS of 71.38. May I ask if anyone has encountered this problem? Hello,have you solved the problem? I have the same problem that I changed the sweep 0 -> 9, but the performance is still lower that the original results.
感谢您的来信,已收到
I have trans. bevfusion to other projects, but mAP is very low and I don't know what the problem is. Here is my configuration file:
_base_ = [#'../_base_/datasets/nus-3d.py',
'../_base_/default_runtime.py']
voxel_size = [0.075, 0.075, 0.2]
point_cloud_range = [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]
class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
metainfo = dict(classes=class_names)
input_modality = dict(use_lidar=True, use_camera=True)
backend_args = None
data_config={
'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',
'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],
'Ncams': 6,
'input_size': (256, 704),
'src_size': (900, 1600),
# Augmentation
'resize': (-0.06, 0.11),
'rot': (-5.4, 5.4),
'flip': True,
'crop_h': (0.0, 0.0),
'resize_test':0.04,
}
device='cuda'
model = dict(
type = 'BEVFusion',
encoders = dict(
camera = dict(
backbone = dict(
type = 'SwinTransformer',
embed_dims = 96,
depths = [2, 2, 6, 2],
num_heads = [3, 6, 12, 24],
window_size = 7,
mlp_ratio = 4,
qkv_bias = True,
qk_scale = None,
drop_rate = 0.0,
attn_drop_rate = 0.0,
drop_path_rate = 0.2,
patch_norm = True,
out_indices = [1, 2, 3],
with_cp = False,
convert_weights = True,
init_cfg=dict(
type='Pretrained',
checkpoint= 'pretrained/checkpoint/swint-nuimages-pretrained.pth' # noqa: E251
# 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth' # noqa: E501
)
# pretrained='torchvision://resnet50',
# # pretrained='/home/lwx/Desktop/lwx/distill-bev/resnet50.pth',
# type='ResNet',
# depth=50,
# num_stages=4,
# out_indices=(1,2, 3),
# frozen_stages=-1,
# norm_cfg=dict(type='BN', requires_grad=True),
# norm_eval=False,
# with_cp=True,
# style='pytorch'
),
neck=dict(
type='GeneralizedLSSFPN',
in_channels=[192, 384, 768],
# in_channels = [512, 1024, 2048],
# in_channels=[1024, 2048],
out_channels=256,
start_level=0,
num_outs=3,
norm_cfg=dict(type='BN2d', requires_grad=True),
act_cfg=dict(type='ReLU', inplace=True),
upsample_cfg=dict(mode='bilinear', align_corners=False)
),
vtransform=dict(
type='DepthLSSTransformBEVFusion',
in_channels=256,
out_channels=80,
image_size=[256, 704],
# feature_size=[16 , 44],
feature_size=[32, 88],
xbound=[-54.0, 54.0, 0.3],
ybound=[-54.0, 54.0, 0.3],
zbound=[-10.0, 10.0, 20.0],
dbound=[1.0, 60.0, 0.5],
downsample=2
),
# fusion_layer=dict(
# type='ConvFuser', in_channels=[80, 256], out_channels=256)
),
lidar = dict(
voxelize = dict(
max_num_points = 10,
point_cloud_range = point_cloud_range,
voxel_size = voxel_size,
max_voxels = [120000, 160000]
),
backbone = dict(
type = 'SparseEncoder',
in_channels = 5,
output_channels = 128,
order=('conv', 'norm', 'act'),
encoder_channels=((16, 16, 32), (32, 32, 64), (64, 64, 128), (128,128)),
encoder_paddings=((0, 0, 1), (0, 0, 1), (0, 0, (1, 1, 0)), (0, 0)),
block_type='basicblock',
sparse_shape = [1440, 1440, 41]
)
)
),
fuser = dict(
type = 'ConvFuser',
in_channels = [80, 256],
out_channels = 256
),
decoder = dict(
backbone = dict(
type='SECOND',
in_channels=256,
out_channels=[128, 256],
layer_nums=[5, 5],
layer_strides=[1, 2],
norm_cfg=dict(type='BN', eps=0.001, momentum=0.01),
conv_cfg=dict(type='Conv2d', bias=False)
),
neck = dict(
type='SECONDFPN',
in_channels=[128, 256],
out_channels=[256, 256],
upsample_strides=[1, 2],
norm_cfg=dict(type='BN', eps=0.001, momentum=0.01),
upsample_cfg=dict(type='deconv', bias=False),
),
),
heads = dict(
map=None,
object = dict(
type='TransFusionHead',
num_proposals=100,
auxiliary=True,
in_channels=512,
hidden_channel=128,
num_classes=10,
nms_kernel_size=3,
num_decoder_layers=1,
num_heads = 4,
ffn_channel=256,
dropout=0.1,
bn_momentum=0.1,
activation='relu',
train_cfg=dict(
dataset='nuScenes',
point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0],
grid_size=[1440, 1440, 41],
voxel_size=[0.075, 0.075, 0.2],
out_size_factor=8,
gaussian_overlap=0.1,
min_radius=2,
pos_weight=-1,
code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
assigner=dict(
type='HungarianAssigner3D',
iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'),
cls_cost=dict(
type='mmdet.FocalLossCost',
gamma=2.0,
alpha=0.25,
weight=0.15),
reg_cost=dict(type='BBoxBEVL1Cost', weight=0.25),
iou_cost=dict(type='IoU3DCost', weight=0.25))),
test_cfg=dict(
dataset='nuScenes',
grid_size=[1440, 1440, 41],
out_size_factor=8,
voxel_size=[0.075, 0.075],
pc_range=[-54.0, -54.0],
nms_type=None
),
common_heads=dict(
center=[2, 2], height=[1, 2], dim=[3, 2], rot=[2, 2], vel=[2, 2]),
bbox_coder=dict(
type='TransFusionBBoxCoder',
pc_range=[-54.0, -54.0],
post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
score_threshold=0.0,
out_size_factor=8,
voxel_size=[0.075, 0.075],
code_size=10),
loss_cls=dict(
type='mmdet.FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
reduction='mean',
loss_weight=1.0),
loss_heatmap=dict(
type='mmdet.GaussianFocalLoss', reduction='mean', loss_weight=1.0),
loss_bbox=dict(
type='mmdet.L1Loss', reduction='mean', loss_weight=0.25)
)
),
)
dataset_type = 'NuScenesDataset'
root = 'data/nuscenes/'
file_client_args = dict(backend='disk')
train_pipeline = [
dict(
# type='BEVLoadMultiViewImageFromFiles',
# type = 'LoadMultiViewImageFromFiles_MITBF',
type = 'LoadMultiViewImageFromFiles_BEVDet',
is_train=True,
# to_float32=True,
data_config=data_config,
# color_type='color',
),
dict(
type='LoadPointsFromFile_MITBF',
coord_type='LIDAR',
load_dim=5,
use_dim=5,
),
dict(
type='LoadPointsFromMultiSweeps_MITBF',
sweeps_num=9,
load_dim=5,
use_dim=5,
pad_empty_sweeps=True,
remove_close=True,
),
dict(
type='LoadAnnotations3D_MITBF',
with_bbox_3d=True,
with_label_3d=True,
with_attr_label=False),
# dict(
# type='ImageAug3D',
# final_dim=[256, 704],
# resize_lim=[0.38, 0.55],
# bot_pct_lim=[0.0, 0.0],
# rot_lim=[-5.4, 5.4],
# rand_flip=True,
# is_train=True),
dict(
type='GlobalRotScaleTrans',
scale_ratio_range=[0.9, 1.1],
rot_range=[-0.78539816, 0.78539816],
translation_std=0.5,
update_img2lidar=True),
dict(type='RandomFlip3D',sync_2d=False,
flip_ratio_bev_horizontal=0.5,
flip_ratio_bev_vertical=0.5,
update_img2lidar=True),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='ObjectNameFilter',
classes=class_names),
# Actually, 'GridMask' is not used here
# dict(
# type='MITGridMask',
# use_h=True,
# use_w=True,
# max_epoch=2,
# rotate=1,
# offset=False,
# ratio=0.5,
# mode=1,
# prob=0.0,
# fixed_prob=True),
# dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'img_inputs', 'gt_bboxes_3d', 'gt_labels_3d'],
meta_keys=(
'cam2img', 'ori_cam2img', 'camera2ego','lidar2ego','lidar2cam', 'lidar2img', 'cam2lidar',
'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx',
'lidar_path', 'img_path', 'transformation_3d_flow', 'pcd_rotation',
'pcd_scale_factor', 'pcd_trans', 'img_aug_matrix',
'lidar_aug_matrix', 'num_pts_feats'))
]
test_pipeline = [
dict(
type = 'LoadMultiViewImageFromFiles_BEVDet',
is_train=True,
data_config=data_config,
),
dict(
type='LoadPointsFromFile_MITBF',
coord_type='LIDAR',
load_dim=5,
use_dim=5,
),
dict(
type='LoadPointsFromMultiSweeps_MITBF',
sweeps_num=9,
load_dim=5,
use_dim=5,
pad_empty_sweeps=True,
remove_close=True,
),
dict(
type='LoadAnnotations3D_MITBF',
with_bbox_3d=True,
with_label_3d=True,
with_attr_label=False),
dict(
type='GlobalRotScaleTrans',
scale_ratio_range=[1, 1],
# translation_std=0.5,
rot_range = [0, 0],
translation_std=0,
update_img2lidar=True),
dict(type='RandomFlip3D',sync_2d=False,
flip_ratio_bev_horizontal=0.0,
flip_ratio_bev_vertical=0.0,
update_img2lidar=True),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='ObjectNameFilter',
classes=class_names),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=['img_inputs', 'points', 'gt_bboxes_3d', 'gt_labels_3d'],
meta_keys=[
'cam2img', 'ori_cam2img', 'camera2ego','lidar2ego','lidar2cam', 'lidar2img', 'cam2lidar',
'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx',
'lidar_path', 'img_path', 'transformation_3d_flow', 'pcd_rotation',
'pcd_scale_factor', 'pcd_trans', 'img_aug_matrix',
'lidar_aug_matrix', 'num_pts_feats'
# 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar',
# 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx',
# 'lidar_path', 'img_path', 'num_pts_feats', 'num_views'
])
]
data_prefix = dict(
pts='samples/LIDAR_TOP',
CAM_FRONT='samples/CAM_FRONT',
CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
CAM_BACK='samples/CAM_BACK',
CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
sweeps='sweeps/LIDAR_TOP')
data = dict(
samples_per_gpu=1,
workers_per_gpu=2,
train=dict(
type='CBGSDataset_MITBF',
dataset=dict(
type=dataset_type,
data_root=root,
ann_file=root + 'nuscenes_infos_train_mitbf.pkl',
pipeline=train_pipeline,
classes=class_names,
# map_classes=None,
img_info_prototype='bevdet',
modality=input_modality,
test_mode=False,
use_valid_flag=True,
box_type_3d='LiDAR'
)
),
val=dict(
type=dataset_type,
data_root=root,
ann_file=root + 'nuscenes_infos_val_mitbf.pkl',
pipeline=test_pipeline,
classes=class_names,
# map_classes=None,
img_info_prototype='bevdet',
modality=input_modality,
test_mode=False,
use_valid_flag=True,
box_type_3d='LiDAR'
),
test=dict(
type=dataset_type,
data_root=root,
ann_file=root + 'nuscenes_infos_val_mitbf.pkl',
pipeline=test_pipeline,
img_info_prototype='bevdet',
classes=class_names,
# map_classes=None,
modality=input_modality,
test_mode=False,
use_valid_flag=True,
box_type_3d='LiDAR',
)
)
# learning rate
lr = 0.0001
# Optimizer
optimizer = dict(type='AdamW', lr=2e-4, weight_decay=0.01)
optimizer_config = dict(
grad_clip=dict(max_norm=35,norm_type=2)
)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[16, 22])
# Default setting for scaling LR automatically
# - `enable` means enable scaling LR automatically
# or not by default.
# - `base_batch_size` = (8 GPUs) x (4 samples per GPU).
# auto_scale_lr = dict(enable=False, base_batch_size=32)
runner = dict(type='EpochBasedRunner', max_epochs=24)
evaluation = dict(interval=1)
checkpoint=dict( interval=1)
# default_hooks = dict(
# logger=dict(type='LoggerHook', interval=50),
# checkpoint=dict(type='CheckpointHook', interval=1))
# custom_hooks = [dict(type='DisableObjectSampleHook', disable_after_epoch=15)]
# del _base_.custom_hooks
i met the same problem when train bevfusion , could you please teel me how to solve it? The para sweeps_num of loadpointsfrommultisweeps function is set to 0 in the config file of the new code, whereas it was set to sweeps 9 in the original code. I think this may be causing the problem, so you can try it update: I have tried the above methods and the problem has been solved
Thank you very much for your valuable suggestion! I use the newest code and change sweeps_num = 0->9 at line 33 in bevfusion/configs/nuscenes/det/default.yaml to train. The training performance is very close to the officially announced mAP of 68.52 and NDS of 71.38. The training results are as follows:
Results writes to /tmp/tmpkpx7_6_a/results/results_nusc.json mAP: 0.6805 mATE: 0.2860 mASE: 0.2535 mAOE: 0.3113 mAVE: 0.2557 mAAE: 0.1878 NDS: 0.7108 Eval time: 96.7s Per-class results: Object Class AP ATE ASE AOE AVE AAE car 0.884 0.170 0.148 0.060 0.274 0.186 truck 0.636 0.321 0.181 0.087 0.247 0.221 bus 0.742 0.336 0.187 0.062 0.437 0.272 trailer 0.430 0.532 0.206 0.571 0.216 0.136 construction_vehicle 0.295 0.724 0.428 0.873 0.117 0.295 pedestrian 0.877 0.132 0.283 0.397 0.216 0.103 motorcycle 0.780 0.186 0.248 0.228 0.350 0.277 bicycle 0.639 0.163 0.255 0.470 0.188 0.012 traffic_cone 0.792 0.119 0.320 nan nan nan barrier 0.729 0.176 0.277 0.054 nan nan
I also changed the sweep 0 -> 9, but the performance is still lower that the original results, To check the difference of training detail, could you share the training log?
@konyul I met the same problem. Have you solved it?
感谢您的来信,已收到
i met the same problem when train bevfusion , could you please teel me how to solve it? The para sweeps_num of loadpointsfrommultisweeps function is set to 0 in the config file of the new code, whereas it was set to sweeps 9 in the original code. I think this may be causing the problem, so you can try it update: I have tried the above methods and the problem has been solved
Thank you very much for your valuable suggestion! I use the newest code and change sweeps_num = 0->9 at line 33 in bevfusion/configs/nuscenes/det/default.yaml to train. The training performance is very close to the officially announced mAP of 68.52 and NDS of 71.38. The training results are as follows:
Results writes to /tmp/tmpkpx7_6_a/results/results_nusc.json mAP: 0.6805 mATE: 0.2860 mASE: 0.2535 mAOE: 0.3113 mAVE: 0.2557 mAAE: 0.1878 NDS: 0.7108 Eval time: 96.7s Per-class results: Object Class AP ATE ASE AOE AVE AAE car 0.884 0.170 0.148 0.060 0.274 0.186 truck 0.636 0.321 0.181 0.087 0.247 0.221 bus 0.742 0.336 0.187 0.062 0.437 0.272 trailer 0.430 0.532 0.206 0.571 0.216 0.136 construction_vehicle 0.295 0.724 0.428 0.873 0.117 0.295 pedestrian 0.877 0.132 0.283 0.397 0.216 0.103 motorcycle 0.780 0.186 0.248 0.228 0.350 0.277 bicycle 0.639 0.163 0.255 0.470 0.188 0.012 traffic_cone 0.792 0.119 0.320 nan nan nan barrier 0.729 0.176 0.277 0.054 nan nan
I also changed the sweep 0 -> 9, but the performance is still lower that the original results, To check the difference of training detail, could you share the training log?
@konyul I met the same problem. Have you solved it?
Actually, I haven't solved this problem yet. Currently, I am researching other aspects. If it's resolved, I hope you can let me know
loadpointsfrommultisweeps
Hi, I think you have already make it,I met the same problem,would you like to tell me the reason and method to solve it?
感谢您的来信,已收到
Thank you for your interest in our project. This repository is no longer actively maintained, so we will be closing this issue. Please refer to the amazing implementation at MMDetection3D. Thank you again!