mmdetection3d icon indicating copy to clipboard operation
mmdetection3d copied to clipboard

[Bug] Is training accuracy related to batch_size in bevfusion?

Open wzqforever opened this issue 1 year ago • 2 comments

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment


System environment: sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 1155412052 GPU 0,1,2,3,4,5: Tesla V100S-PCIE-32GB CUDA_HOME: /home/guanjingchao/cuda-11.6 NVCC: Cuda compilation tools, release 11.6, V11.6.55 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.13.1+cu116 PyTorch compiling details: PyTorch built with:

  • GCC 9.3

  • C++ Version: 201402

  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications

  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)

  • OpenMP 201511 (a.k.a. OpenMP 4.5)

  • LAPACK is enabled (usually provided by MKL)

  • NNPACK is enabled

  • CPU capability usage: AVX2

  • CUDA Runtime 11.6

  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86

  • CuDNN 8.3.2 (built against CUDA 11.5)

  • Magma 2.6.1

  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

    TorchVision: 0.14.1+cu116 OpenCV: 4.8.1 MMEngine: 0.9.0

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 1155412052 Distributed launcher: pytorch Distributed training: True GPU number: 6

Reproduces the problem - code sample

base = ['../../../configs/base/default_runtime.py'] custom_imports = dict( imports=['projects.BEVFusion.bevfusion'], allow_failed_imports=False)

model settings

Voxel size for voxel encoder

Usually voxel size is changed consistently with the point cloud range

If point cloud range is modified, do remember to change all related

keys in the config.

voxel_size = [0.075, 0.075, 0.2] point_cloud_range = [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0] class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]

metainfo = dict(classes=class_names) dataset_type = 'NuScenesDataset' data_root = '/home/guanjingchao/datasets/nuscenes/' # 完整nuScenes数据集

data_root = '/home/guanjingchao/datasets/nuscenes-mini/' # mini nuScenes数据集

data_prefix = dict( pts='samples/LIDAR_TOP', CAM_FRONT='samples/CAM_FRONT', CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT', CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT', CAM_BACK='samples/CAM_BACK', CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT', CAM_BACK_LEFT='samples/CAM_BACK_LEFT', sweeps='sweeps/LIDAR_TOP') input_modality = dict(use_lidar=True, use_camera=False)

backend_args = dict(

backend='petrel',

path_mapping=dict({

'./data/nuscenes/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'data/nuscenes/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'./data/nuscenes_mini/':

's3://openmmlab/datasets/detection3d/nuscenes/',

'data/nuscenes_mini/':

's3://openmmlab/datasets/detection3d/nuscenes/'

}))

backend_args = None

model = dict( type='BEVFusion', data_preprocessor=dict( type='Det3DDataPreprocessor', pad_size_divisor=32, voxelize_cfg=dict( max_num_points=10, point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0], voxel_size=[0.075, 0.075, 0.2], max_voxels=[120000, 160000], voxelize_reduce=True)), # pts_voxel_encoder=dict(type='HardSimpleVFE', num_features=5), # 这个已经写在bevfusion.py的voxelize函数里面了, 因此这个是无效的 pts_middle_encoder=dict( type='BEVFusionSparseEncoder', in_channels=5, sparse_shape=[1440, 1440, 41], order=('conv', 'norm', 'act'), norm_cfg=dict(type='BN1d', eps=0.001, momentum=0.01), encoder_channels=((16, 16, 32), (32, 32, 64), (64, 64, 128), (128, 128)), encoder_paddings=((0, 0, 1), (0, 0, 1), (0, 0, (1, 1, 0)), (0, 0)), block_type='basicblock'), pts_backbone=dict( type='SECOND', # 主干网络 in_channels=256, out_channels=[128, 256], layer_nums=[5, 5], layer_strides=[1, 2], norm_cfg=dict(type='BN', eps=0.001, momentum=0.01), conv_cfg=dict(type='Conv2d', bias=False)), pts_neck=dict( type='SECONDFPN', # 颈部网络 in_channels=[128, 256], out_channels=[256, 256], upsample_strides=[1, 2], norm_cfg=dict(type='BN', eps=0.001, momentum=0.01), upsample_cfg=dict(type='deconv', bias=False), use_conv_for_no_stride=True), bbox_head=dict( type='TransFusionHead', num_proposals=200, auxiliary=True, in_channels=512, hidden_channel=128, num_classes=10, nms_kernel_size=3, bn_momentum=0.1, num_decoder_layers=1, decoder_layer=dict( type='TransformerDecoderLayer', self_attn_cfg=dict(embed_dims=128, num_heads=8, dropout=0.1), cross_attn_cfg=dict(embed_dims=128, num_heads=8, dropout=0.1), ffn_cfg=dict( embed_dims=128, feedforward_channels=256, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type='ReLU', inplace=True), ), norm_cfg=dict(type='LN'), pos_encoding_cfg=dict(input_channel=2, num_pos_feats=128)), train_cfg=dict( dataset='nuScenes', point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0], grid_size=[1440, 1440, 41], voxel_size=[0.075, 0.075, 0.2], out_size_factor=8, gaussian_overlap=0.1, min_radius=2, pos_weight=-1, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2], assigner=dict( type='HungarianAssigner3D', iou_calculator=dict(type='BboxOverlaps3D', coordinate='lidar'), cls_cost=dict( type='mmdet.FocalLossCost', gamma=2.0, alpha=0.25, weight=0.15), reg_cost=dict(type='BBoxBEVL1Cost', weight=0.25), iou_cost=dict(type='IoU3DCost', weight=0.25))), test_cfg=dict( dataset='nuScenes', grid_size=[1440, 1440, 41], out_size_factor=8, voxel_size=[0.075, 0.075], pc_range=[-54.0, -54.0], nms_type=None), common_heads=dict( center=[2, 2], height=[1, 2], dim=[3, 2], rot=[2, 2], vel=[2, 2]), bbox_coder=dict( type='TransFusionBBoxCoder', pc_range=[-54.0, -54.0], post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], score_threshold=0.0, out_size_factor=8, voxel_size=[0.075, 0.075], code_size=10), loss_cls=dict( type='mmdet.FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, reduction='mean', loss_weight=1.0), loss_heatmap=dict( type='mmdet.GaussianFocalLoss', reduction='mean', loss_weight=1.0), loss_bbox=dict( type='mmdet.L1Loss', reduction='mean', loss_weight=0.25)))

db_sampler = dict( data_root=data_root, info_path=data_root + 'nuscenes_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict( car=5, truck=5, bus=5, trailer=5, construction_vehicle=5, traffic_cone=5, barrier=5, motorcycle=5, bicycle=5, pedestrian=5)), classes=class_names, sample_groups=dict( car=2, truck=3, construction_vehicle=7, bus=4, trailer=6, barrier=2, motorcycle=6, bicycle=6, pedestrian=2, traffic_cone=2), points_loader=dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=[0, 1, 2, 3, 4], backend_args=backend_args))

train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False), dict(type='ObjectSample', db_sampler=db_sampler), dict( type='GlobalRotScaleTrans', scale_ratio_range=[0.9, 1.1], rot_range=[-0.78539816, 0.78539816], translation_std=0.5), dict(type='BEVFusionRandomFlip3D'), dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range), dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict(type='PointShuffle'), dict( type='Pack3DDetInputs', keys=[ 'points', 'img', 'gt_bboxes_3d', 'gt_labels_3d', 'gt_bboxes', 'gt_labels' ], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'transformation_3d_flow', 'pcd_rotation', 'pcd_scale_factor', 'pcd_trans', 'img_aug_matrix', 'lidar_aug_matrix' ]) ]

test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, backend_args=backend_args), dict( type='LoadPointsFromMultiSweeps', sweeps_num=9, load_dim=5, use_dim=5, pad_empty_sweeps=True, remove_close=True, backend_args=backend_args), dict( type='PointsRangeFilter', point_cloud_range=[-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]), dict( type='Pack3DDetInputs', keys=['img', 'points', 'gt_bboxes_3d', 'gt_labels_3d'], meta_keys=[ 'cam2img', 'ori_cam2img', 'lidar2cam', 'lidar2img', 'cam2lidar', 'ori_lidar2img', 'img_aug_matrix', 'box_type_3d', 'sample_idx', 'lidar_path', 'img_path', 'num_pts_feats', 'num_views' ]) ]

train_dataloader = dict( batch_size=4, num_workers=4, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), dataset=dict( type='CBGSDataset', dataset=dict( type=dataset_type, data_root=data_root, ann_file='nuscenes_infos_train.pkl', pipeline=train_pipeline, metainfo=metainfo, modality=input_modality, test_mode=False, data_prefix=data_prefix, use_valid_flag=True, # we use box_type_3d='LiDAR' in kitti and nuscenes dataset # and box_type_3d='Depth' in sunrgbd and scannet dataset. box_type_3d='LiDAR'))) val_dataloader = dict( batch_size=1, num_workers=4, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, ann_file='nuscenes_infos_val.pkl', pipeline=test_pipeline, metainfo=metainfo, modality=input_modality, data_prefix=data_prefix, test_mode=True, box_type_3d='LiDAR', backend_args=backend_args)) test_dataloader = val_dataloader

val_evaluator = dict( type='NuScenesMetric', data_root=data_root, ann_file=data_root + 'nuscenes_infos_val.pkl', metric='bbox', backend_args=backend_args) test_evaluator = val_evaluator

vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')

learning rate

lr = 0.0001 param_scheduler = [ # learning rate scheduler # During the first 8 epochs, learning rate increases from 0 to lr * 10 # during the next 12 epochs, learning rate decreases from lr * 10 to # lr * 1e-4 dict( type='CosineAnnealingLR', T_max=8, eta_min=lr * 10, begin=0, end=8, by_epoch=True, convert_to_iter_based=True), dict( type='CosineAnnealingLR', T_max=12, eta_min=lr * 1e-4, begin=8, end=20, by_epoch=True, convert_to_iter_based=True), # momentum scheduler # During the first 8 epochs, momentum increases from 0 to 0.85 / 0.95 # during the next 12 epochs, momentum increases from 0.85 / 0.95 to 1 dict( type='CosineAnnealingMomentum', T_max=8, eta_min=0.85 / 0.95, begin=0, end=8, by_epoch=True, convert_to_iter_based=True), dict( type='CosineAnnealingMomentum', T_max=12, eta_min=1, begin=8, end=20, by_epoch=True, convert_to_iter_based=True) ]

runtime settings

train_cfg = dict(by_epoch=True, max_epochs=20, val_interval=1) val_cfg = dict() test_cfg = dict()

optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01), clip_grad=dict(max_norm=35, norm_type=2))

Default setting for scaling LR automatically

- enable means enable scaling LR automatically

or not by default.

- base_batch_size = (8 GPUs) x (4 samples per GPU).

auto_scale_lr = dict(enable=False, base_batch_size=32) #2258 log_processor = dict(window_size=50)

default_hooks = dict( logger=dict(type='LoggerHook', interval=50), checkpoint=dict(type='CheckpointHook', interval=5)) custom_hooks = [dict(type='DisableObjectSampleHook', disable_after_epoch=15)]

find_unused_parameters=True

Reproduces the problem - command or script

CUDA_VISIBLE_DEVICES="2,3,4,5,6,7" bash tools/dist_train.sh projects/BEVFusion/configs/bevfusion_lidar-cam_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py 6 --cfg-options load_from=work_dirs/lidar/lidar_epoch_20.pth model.img_backbone.init_cfg.checkpoint=pre/swint-nuimages-pretrained.pth --amp

Reproduces the problem - error message

When I use batch_size=4 to train lidar-only module, I can achieve the accuracy of the paper. However, when I trained image+lidar fusion, due to insufficient graphics memory, I set batch_size=2 for training, and the accuracy could only reach around mAP=0.66. Therefore, I also set batch_size=4, but used fp16 mode with an accuracy of mAP=0.67. Is this why? Is training accuracy related to batch_size in bevfusion? Is it related to lidar-only using batch_size=4 and image+lidar using batch_size=2? If so, what should do?

Additional information

No response

wzqforever avatar Feb 01 '24 05:02 wzqforever

Hi bro, Can you specify the versions you have installed for all the lib? I am not able to run the bash scripts/convert_data.py

Manishnayak234 avatar Feb 05 '24 08:02 Manishnayak234

I am using the main branch of mmdection3d. In this mmdection3d, you need to use tools/create_data.py to generate nuscenes datasets. You can refer to https://mmdetection3d.readthedocs.io/en/latest/user_guides/dataset_prepare.html.

python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes

@Manishnayak234

wzqforever avatar Feb 07 '24 13:02 wzqforever