mmdetection3d
mmdetection3d copied to clipboard
[Bug] Is training accuracy related to batch_size in bevfusion?
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] I have read the FAQ documentation but cannot get the expected help.
- [X] The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
main branch https://github.com/open-mmlab/mmdetection3d
Environment
System environment: sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 1155412052 GPU 0,1,2,3,4,5: Tesla V100S-PCIE-32GB CUDA_HOME: /home/guanjingchao/cuda-11.6 NVCC: Cuda compilation tools, release 11.6, V11.6.55 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.13.1+cu116 PyTorch compiling details: PyTorch built with:
-
GCC 9.3
-
C++ Version: 201402
-
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
-
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
-
OpenMP 201511 (a.k.a. OpenMP 4.5)
-
LAPACK is enabled (usually provided by MKL)
-
NNPACK is enabled
-
CPU capability usage: AVX2
-
CUDA Runtime 11.6
-
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
-
CuDNN 8.3.2 (built against CUDA 11.5)
-
Magma 2.6.1
-
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.14.1+cu116 OpenCV: 4.8.1 MMEngine: 0.9.0
Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1155412052 Distributed launcher: pytorch Distributed training: True GPU number: 6
Reproduces the problem - code sample
base = ['../../../configs/base/default_runtime.py'] custom_imports = dict( imports=['projects.BEVFusion.bevfusion'], allow_failed_imports=False)
model settings
Voxel size for voxel encoder
Usually voxel size is changed consistently with the point cloud range
If point cloud range is modified, do remember to change all related
keys in the config.
voxel_size = [0.075, 0.075, 0.2] point_cloud_range = [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0] class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]
metainfo = dict(classes=class_names) dataset_type = 'NuScenesDataset' data_root = '/home/guanjingchao/datasets/nuscenes/' # 完整nuScenes数据集
data_root = '/home/guanjingchao/datasets/nuscenes-mini/' # mini nuScenes数据集
data_prefix = dict( pts='samples/LIDAR_TOP', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', CAM_BACK='samples/CAM_BACK', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', sweeps='sweeps/LIDAR_TOP') input_modality = dict(use_lidar=True, use_camera=False)
backend_args = dict(
backend='petrel',
path_mapping=dict({
'./data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/',
'data/nuscenes/':
's3://openmmlab/datasets/detection3d/nuscenes/',
'./data/nuscenes_mini/':
's3://openmmlab/datasets/detection3d/nuscenes/',
'data/nuscenes_mini/':
's3://openmmlab/datasets/detection3d/nuscenes/'
}))
backend_args = None
model = dict( type='BEVFusion', data_preprocessor=dict( type='Det3DDataPreprocessor', pad_size_divisor=32, voxelize_cfg=dict( max_num_points=10, point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0], voxel_size=[0.075, 0.075, 0.2], max_voxels=[120000, 160000], voxelize_reduce=True)), # pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=5), # 这个已经写在bevfusion.py的voxelize函数里面了, 因此这个是无效的 pts_middle_encoder=dict( type='BEVFusionSparseEncoder', in_channels=5, sparse_shape=[1440, 1440, 41], order=('conv', 'norm', 'act'), norm_cfg=dict(type='BN1d', eps=0.001, momentum=0.01), encoder_channels=((16, 16, 32), (32, 32, 64), (64, 64, 128), (128, 128)), encoder_paddings=((0, 0, 1), (0, 0, 1), (0, 0, (1, 1, 0)), (0, 0)), block_type='basicblock'), pts_backbone=dict( type='SECOND', # 主干网络 in_channels=256, out_channels=[128, 256], layer_nums=[5, 5], layer_strides=[1, 2], norm_cfg=dict(type='BN', eps=0.001, momentum=0.01), conv_cfg=dict(type='Conv2d', bias=False)), pts_neck=dict( type='SECONDFPN', # 颈部网络 in_channels=[128, 256], out_channels=[256, 256], upsample_strides=[1, 2], norm_cfg=dict(type='BN', eps=0.001, momentum=0.01), upsample_cfg=dict(type='deconv', bias=False), use_conv_for_no_stride=True), bbox_head=dict( type='TransFusionHead', num_proposals=200, auxiliary=True, in_channels=512, hidden_channel=128, num_classes=10, nms_kernel_size=3, bn_momentum=0.1, num_decoder_layers=1, decoder_layer=dict( type='TransformerDecoderLayer', self_attn_cfg=dict(embed_dims=128, num_heads=8, dropout=0.1), cross_attn_cfg=dict(embed_dims=128, num_heads=8, dropout=0.1), ffn_cfg=dict( embed_dims=128, feedforward_channels=256, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type='ReLU', inplace=True), ), norm_cfg=dict(type='LN'), pos_encoding_cfg=dict(input_channel=2, num_pos_feats=128)), train_cfg=dict( dataset='nuScenes', point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0], grid_size=[1440, 1440, 41], voxel_size=[0.075, 0.075, 0.2], out_size_factor=8, gaussian_overlap=0.1, min_radius=2, pos_weight=-1, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2], assigner=dict( type='HungarianAssigner3D', iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'), cls_cost=dict( type='mmdet.FocalLossCost', gamma=2.0, alpha=0.25, weight=0.15), reg_cost=dict(type='BBoxBEVL1Cost', weight=0.25), iou_cost=dict(type='IoU3DCost', weight=0.25))), test_cfg=dict( dataset='nuScenes', grid_size=[1440, 1440, 41], out_size_factor=8, voxel_size=[0.075, 0.075], pc_range=[-54.0, -54.0], nms_type=None), common_heads=dict( center=[2, 2], height=[1, 2], dim=[3, 2], rot=[2, 2], vel=[2, 2]), bbox_coder=dict( type='TransFusionBBoxCoder', pc_range=[-54.0, -54.0], post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], score_threshold=0.0, out_size_factor=8, voxel_size=[0.075, 0.075], code_size=10), loss_cls=dict( type='mmdet.FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, reduction='mean', loss_weight=1.0), loss_heatmap=dict( type='mmdet.GaussianFocalLoss', reduction='mean', loss_weight=1.0), loss_bbox=dict( type='mmdet.L1Loss', reduction='mean', loss_weight=0.25)))
db_sampler = dict( data_root=data_root, info_path=data_root + 'nuscenes_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict( car=5, truck=5, bus=5, trailer=5, construction_vehicle=5, traffic_cone=5, barrier=5, motorcycle=5, bicycle=5, pedestrian=5)), classes=class_names, sample_groups=dict( car=2, truck=3, construction_vehicle=7, bus=4, trailer=6, barrier=2, motorcycle=6, bicycle=6, pedestrian=2, traffic_cone=2), points_loader=dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=[0, 1, 2, 3, 4], backend_args=backend_args))
train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectSample', db_sampler=db_sampler), dict( type='GlobalRotScaleTrans', scale_ratio_range=[0.9, 1.1], rot_range=[-0.78539816, 0.78539816], translation_std=0.5), dict(type='BEVFusionRandomFlip3D'), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict(type='PointShuffle'), dict( type='Pack3DDetInputs', keys=[ 'points', 'img', 'gt_bboxes_3d', 'gt_labels_3d', 'gt_bboxes', 'gt_labels' ], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'transformation_3d_flow', 'pcd_rotation', 'pcd_scale_factor', 'pcd_trans', 'img_aug_matrix', 'lidar_aug_matrix' ]) ]
test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='PointsRangeFilter', point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]), dict( type='Pack3DDetInputs', keys=['img', 'points', 'gt_bboxes_3d', 'gt_labels_3d'], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'num_pts_feats', 'num_views' ]) ]
train_dataloader = dict( batch_size=4, num_workers=4, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), dataset=dict( type='CBGSDataset', dataset=dict( type=dataset_type, data_root=data_root, ann_file='nuscenes_infos_train.pkl', pipeline=train_pipeline, metainfo=metainfo, modality=input_modality, test_mode=False, data_prefix=data_prefix, use_valid_flag=True, # we use box_type_3d='LiDAR' in kitti and nuscenes dataset # and box_type_3d='Depth' in sunrgbd and scannet dataset. box_type_3d='LiDAR'))) val_dataloader = dict( batch_size=1, num_workers=4, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, ann_file='nuscenes_infos_val.pkl', pipeline=test_pipeline, metainfo=metainfo, modality=input_modality, data_prefix=data_prefix, test_mode=True, box_type_3d='LiDAR', backend_args=backend_args)) test_dataloader = val_dataloader
val_evaluator = dict( type='NuScenesMetric', data_root=data_root, ann_file=data_root + 'nuscenes_infos_val.pkl', metric='bbox', backend_args=backend_args) test_evaluator = val_evaluator
vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
learning rate
lr = 0.0001 param_scheduler = [ # learning rate scheduler # During the first 8 epochs, learning rate increases from 0 to lr * 10 # during the next 12 epochs, learning rate decreases from lr * 10 to # lr * 1e-4 dict( type='CosineAnnealingLR', T_max=8, eta_min=lr * 10, begin=0, end=8, by_epoch=True, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=12, eta_min=lr * 1e-4, begin=8, end=20, by_epoch=True, convert_to_iter_based=True), # momentum scheduler # During the first 8 epochs, momentum increases from 0 to 0.85 / 0.95 # during the next 12 epochs, momentum increases from 0.85 / 0.95 to 1 dict( type='CosineAnnealingMomentum', T_max=8, eta_min=0.85 / 0.95, begin=0, end=8, by_epoch=True, convert_to_iter_based=True), dict( type='CosineAnnealingMomentum', T_max=12, eta_min=1, begin=8, end=20, by_epoch=True, convert_to_iter_based=True) ]
runtime settings
train_cfg = dict(by_epoch=True, max_epochs=20, val_interval=1) val_cfg = dict() test_cfg = dict()
optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01), clip_grad=dict(max_norm=35, norm_type=2))
Default setting for scaling LR automatically
- enable
means enable scaling LR automatically
or not by default.
- base_batch_size
= (8 GPUs) x (4 samples per GPU).
auto_scale_lr = dict(enable=False, base_batch_size=32) #2258 log_processor = dict(window_size=50)
default_hooks = dict( logger=dict(type='LoggerHook', interval=50), checkpoint=dict(type='CheckpointHook', interval=5)) custom_hooks = [dict(type='DisableObjectSampleHook', disable_after_epoch=15)]
find_unused_parameters=True
Reproduces the problem - command or script
CUDA_VISIBLE_DEVICES="2,3,4,5,6,7" bash tools/dist_train.sh projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py 6 --cfg-options load_from=work_dirs/lidar/lidar_epoch_20.pth model.img_backbone.init_cfg.checkpoint=pre/swint-nuimages-pretrained.pth --amp
Reproduces the problem - error message
When I use batch_size=4
to train lidar-only module, I can achieve the accuracy of the paper. However, when I trained image+lidar fusion, due to insufficient graphics memory, I set batch_size=2
for training, and the accuracy could only reach around mAP=0.66
. Therefore, I also set batch_size=4
, but used fp16
mode with an accuracy of mAP=0.67
. Is this why? Is training accuracy related to batch_size
in bevfusion? Is it related to lidar-only using batch_size=4
and image+lidar using batch_size=2
? If so, what should do?
Additional information
No response
Hi bro, Can you specify the versions you have installed for all the lib? I am not able to run the bash scripts/convert_data.py
I am using the main branch of mmdection3d. In this mmdection3d, you need to use tools/create_data.py to generate nuscenes datasets. You can refer to https://mmdetection3d.readthedocs.io/en/latest/user_guides/dataset_prepare.html.
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
@Manishnayak234