OpenOccupancy icon indicating copy to clipboard operation
OpenOccupancy copied to clipboard

Apply single-GPU debug through pycharm

Open onionysy opened this issue 1 year ago • 3 comments

We want to debug using a single graphics card with pycharm, instead of using distributed training. But we ran into the following problems:

fatal: not a git repository (or any of the parent directories): .git 2023-09-26 21:37:31,804 - mmdet - INFO - Environment info: sys.platform: linux Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.3.r11.3/compiler.29920130_0 GCC: gcc (GCC) 6.1.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.2
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, TorchVision: 0.11.2 OpenCV: 4.8.0 MMCV: 1.4.0 MMCV Compiler: GCC 6.1 MMCV CUDA Compiler: 11.3 MMDetection: 2.14.0 MMSegmentation: 0.14.1 MMDetection3D: 0.17.1+ 2023-09-26 21:37:36,433 - mmdet - INFO - Distributed training: False 2023-09-26 21:37:36,433 - mmdet - INFO - Set random seed to 0, deterministic: False /home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py:400: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead warnings.warn('DeprecationWarning: pretrained is deprecated, ' 2023-09-26 21:37:37,026 - mmdet - INFO - Number of params: 123453321 2023-09-26 21:37:37,063 - mmdet - INFO - initialize ResNet with init_cfg {'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'} 2023-09-26 21:37:37,063 - mmcv - INFO - load model from: torchvision://resnet50 2023-09-26 21:37:37,063 - mmcv - INFO - load checkpoint from torchvision path: torchvision://resnet50 2023-09-26 21:37:37,124 - mmcv - WARNING - The model and loaded state dict do not match exactly unexpected key in source state_dict: fc.weight, fc.bias 2023-09-26 21:37:37,140 - mmdet - INFO - initialize SECONDFPN with init_cfg [{'type': 'Kaiming', 'layer': 'ConvTranspose2d'}, {'type': 'Constant', 'layer': 'NaiveSyncBatchNorm2d', 'val': 1.0}] WARNING!!!!, Only can be used for obtain inference speed!!!! WARNING!!!!, Only can be used for obtain inference speed!!!! 2023-09-26 21:37:44,838 - mmdet - INFO - Start running, host: ysy@ysy-System-Product-Name, work_dir: /home/ysy/neural_network/occupancy/OpenOccupancy-main/tools/work_dirs/Multimodal-R50_img1600_cascade_x4 2023-09-26 21:37:44,839 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
    (NORMAL ) CheckpointHook
    (NORMAL ) OccEvalHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    before_train_epoch: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
    (NORMAL ) OccEvalHook
    (LOW ) IterTimerHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    before_train_iter: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
    (NORMAL ) OccEvalHook
    (LOW ) IterTimerHook
    after_train_iter: (ABOVE_NORMAL) OptimizerHook
    (NORMAL ) CheckpointHook
    (NORMAL ) OccEvalHook
    (LOW ) IterTimerHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    after_train_epoch: (NORMAL ) CheckpointHook
    (NORMAL ) OccEvalHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    before_val_epoch: (LOW ) IterTimerHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    before_val_iter: (LOW ) IterTimerHook
    after_val_iter: (LOW ) IterTimerHook
    after_val_epoch: (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    after_run: (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    2023-09-26 21:37:44,839 - mmdet - INFO - workflow: [('train', 1)], max: 15 epochs 2023-09-26 21:37:44,839 - mmdet - INFO - Checkpoints will be saved to /home/ysy/neural_network/occupancy/OpenOccupancy-main/tools/work_dirs/Multimodal-R50_img1600_cascade_x4 by HardDiskBackend. Traceback (most recent call last): File "/snap/pycharm-educational/57/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/snap/pycharm-educational/57/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/tools/train.py", line 207, in main() File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/tools/train.py", line 196, in main custom_train_model( File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/apis/train.py", line 20, in custom_train_model custom_train_detector( File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/apis/mmdet_train.py", line 149, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 237, in train_step losses = self(**data) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func return old_func(*args, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet3d-0.17.1-py3.8-linux-x8664.egg/mmdet3d/models/detectors/base.py", line 59, in forward return self.forward_train(**kwargs) File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/detectors/occnet.py", line 202, in forward_train voxel_feats, img_feats, pts_feats, depth = self.extract_feat( File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/detectors/occnet.py", line 113, in extract_feat img_voxel_feats, depth, img_feats = self.extract_img_feat(img, img_metas) File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/detectors/occnet.py", line 68, in extract_img_feat img_enc_feats = self.image_encoder(img[0]) File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/detectors/occnet.py", line 44, in image_encoder backbone_feats = self.img_backbone(imgs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 642, in forward x = res_layer(x) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 297, in forward out = _inner_forward(x) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py", line 268, in _inner_forward out = self.norm1(out) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 732, in forward world_size = torch.distributed.get_world_size(process_group) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size return _get_group_size(group) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size default_pg = _get_default_group() File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 410, in _get_default_group raise RuntimeError( RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. python-BaseException Backend QtAgg is interactive backend. Turning interactive mode on.

We thought this was a SyncBN problem, so we changed it to BN in the configuration file, but encountered the following problems:(It is worth noting that while we are still using sysnBN, it is possible to debug through tesy.py)


fatal: not a git repository (or any of the parent directories): .git 2023-09-26 21:27:49,543 - mmdet - INFO - Environment info: sys.platform: linux Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.3.r11.3/compiler.29920130_0 GCC: gcc (GCC) 6.1.0 PyTorch: 1.10.1 PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.2
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, TorchVision: 0.11.2 OpenCV: 4.8.0 MMCV: 1.4.0 MMCV Compiler: GCC 6.1 MMCV CUDA Compiler: 11.3 MMDetection: 2.14.0 MMSegmentation: 0.14.1 MMDetection3D: 0.17.1+ 2023-09-26 21:27:54,071 - mmdet - INFO - Distributed training: False 2023-09-26 21:27:54,072 - mmdet - INFO - Set random seed to 0, deterministic: False /home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py:400: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead warnings.warn('DeprecationWarning: pretrained is deprecated, ' 2023-09-26 21:27:54,676 - mmdet - INFO - Number of params: 123453321 2023-09-26 21:27:54,717 - mmdet - INFO - initialize ResNet with init_cfg {'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'} 2023-09-26 21:27:54,717 - mmcv - INFO - load model from: torchvision://resnet50 2023-09-26 21:27:54,717 - mmcv - INFO - load checkpoint from torchvision path: torchvision://resnet50 2023-09-26 21:27:54,783 - mmcv - WARNING - The model and loaded state dict do not match exactly unexpected key in source state_dict: fc.weight, fc.bias 2023-09-26 21:27:54,799 - mmdet - INFO - initialize SECONDFPN with init_cfg [{'type': 'Kaiming', 'layer': 'ConvTranspose2d'}, {'type': 'Constant', 'layer': 'NaiveSyncBatchNorm2d', 'val': 1.0}] WARNING!!!!, Only can be used for obtain inference speed!!!! WARNING!!!!, Only can be used for obtain inference speed!!!! 2023-09-26 21:28:02,689 - mmdet - INFO - Start running, host: ysy@ysy-System-Product-Name, work_dir: /home/ysy/neural_network/occupancy/OpenOccupancy-main/tools/work_dirs/Multimodal-R50_img1600_cascade_x4 2023-09-26 21:28:02,689 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
    (NORMAL ) CheckpointHook
    (NORMAL ) OccEvalHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    before_train_epoch: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
    (NORMAL ) OccEvalHook
    (LOW ) IterTimerHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    before_train_iter: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
    (NORMAL ) OccEvalHook
    (LOW ) IterTimerHook
    after_train_iter: (ABOVE_NORMAL) OptimizerHook
    (NORMAL ) CheckpointHook
    (NORMAL ) OccEvalHook
    (LOW ) IterTimerHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    after_train_epoch: (NORMAL ) CheckpointHook
    (NORMAL ) OccEvalHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    before_val_epoch: (LOW ) IterTimerHook
    (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    before_val_iter: (LOW ) IterTimerHook
    after_val_iter: (LOW ) IterTimerHook
    after_val_epoch: (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    after_run: (VERY_LOW ) TextLoggerHook
    (VERY_LOW ) TensorboardLoggerHook
    2023-09-26 21:28:02,690 - mmdet - INFO - workflow: [('train', 1)], max: 15 epochs 2023-09-26 21:28:02,690 - mmdet - INFO - Checkpoints will be saved to /home/ysy/neural_network/occupancy/OpenOccupancy-main/tools/work_dirs/Multimodal-R50_img1600_cascade_x4 by HardDiskBackend. Traceback (most recent call last): File "/snap/pycharm-educational/57/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/snap/pycharm-educational/57/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/tools/train.py", line 207, in main() File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/tools/train.py", line 196, in main custom_train_model( File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/apis/train.py", line 20, in custom_train_model custom_train_detector( File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/apis/mmdet_train.py", line 149, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 237, in train_step losses = self(**data) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func return old_func(*args, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/mmdet3d-0.17.1-py3.8-linux-x8664.egg/mmdet3d/models/detectors/base.py", line 59, in forward return self.forward_train(**kwargs) File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/detectors/occnet.py", line 202, in forward_train voxel_feats, img_feats, pts_feats, depth = self.extract_feat( File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/detectors/occnet.py", line 115, in extract_feat pts_voxel_feats, pts_feats = self.extract_pts_feat(points) File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/detectors/occnet.py", line 98, in extract_pts_feat pts_enc_feats = self.pts_middle_encoder(voxel_features, coors, batch_size) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ysy/neural_network/occupancy/OpenOccupancy-main/projects/occ_plugin/occupancy/voxel_encoder/sparse_lidar_enc.py", line 169, in forward x_conv1 = self.conv1(x) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/spconv/pytorch/modules.py", line 138, in forward input = module(input) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/spconv/pytorch/modules.py", line 142, in forward input = input.replace_feature(module(input.features)) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 135, in forward self._check_input_dim(input) File "/home/ysy/anaconda3/envs/openocc2/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 408, in _check_input_dim raise ValueError("expected 4D input (got {}D input)".format(input.dim())) ValueError: expected 4D input (got 2D input) python-BaseException Backend QtAgg is interactive backend. Turning interactive mode on.

We know that this is a dimensional error, but there is no idea about this problem. Do you have any good suggestions?

onionysy avatar Sep 26 '23 13:09 onionysy

You should change the BN configurations at here (SyncBN-->BN)

JeffWang987 avatar Sep 27 '23 08:09 JeffWang987

You should change the BN configurations at here (SyncBN-->BN)

Change the config as you advise and then encounter the same problem ''ValueError: expected 4D input (got 2D input)'' as mentioned aboved

Agito555 avatar Jun 17 '24 10:06 Agito555

Modiffying the code (SyncBN --> BN) doesn't work for me, which will raise error ValueError: expected 4D input (got 2D input) Actually, there is no need to modify the code (SyncBN --> BN) for debugging within a single GPU. Here is the solution for debugging in Vscode, I guess there is a similar solution to Pycharm.

You can add the following content into your launch.json and press F5 to start debugging.

{
            "name": "Python Distributed_training",
            "type": "debugpy",
            "request": "launch",
            "module": "torch.distributed.launch",
            "console": "internalConsole",
            "cwd": path/to/your/own/project/dir,
            "justMyCode": false,
            "env": {
                "CUDA_VISIBLE_DEVICES": "0"
            },
            "args": [
                "--nnodes", "1",
                "--node_rank","0",
                "--nproc_per_node", "1",
                "--master_addr", "127.0.0.1",
                "--master_port", "29501",
                "./tools/train.py",
                path/to/your/config
                "--seed","0",
                "--launcher","pytorch"

            ]
        },

It works for me, hope it may help somebody who meets the same problem. Reference1 Reference2 Reference3

Agito555 avatar Jun 18 '24 09:06 Agito555