mmdetection icon indicating copy to clipboard operation
mmdetection copied to clipboard

Problems of AMP Training about Co-DETR Reimplement

Open ysysys666 opened this issue 1 year ago • 8 comments

Notice

There are several common situations in the reimplementation issues as below

  1. Reimplement a model in the model zoo using the provided configs

Checklist

  1. I have searched related issues but cannot get the expected help.

Describe the issue

Excuese me ,does CO-DETR support AMP training? When I use AMP reimplement Co-DETR, meet the problem " RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source". After I add a type conversion. I meet the other problem "matched_row_inds, matched_col_inds = linear_sum_assignment(cost) ValueError: matrix contains invalid numeric entries" .

Reproduction

  1. What command or script did you run?
bash ./tools/dist_train.sh 'mmdetection/projects/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_1x_coco.py' 4 --work-dir 'mmdetection/outputs/codetr_5scale_r50_4xb4_12e_coco_results' --amp --auto-scale-lr --launcher 'pytorch'
  1. What config dir you run?
mmdetection/projects/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_1x_coco.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

No

  1. What dataset did you use?

COCO

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here. sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1,2,3: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.4, V11.4.48 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:
  • GCC 9.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.3.2 (built against CUDA 11.5)
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.1 OpenCV: 4.8.1 MMEngine: 0.10.1 MMDetection: 3.2.0+fe3f809 3. You may add addition that may be helpful for locating the problem, such as

  1. How you installed PyTorch [e.g., pip, conda, source] conda
  2. Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Results

If applicable, paste the related results here, e.g., what you expect and what you get.

A placeholder for results comparison

Issue fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

ysysys666 avatar Jan 22 '24 14:01 ysysys666

I'm also having trouble encountering the same Issue.

Kobamiyannnn avatar Jan 26 '24 14:01 Kobamiyannnn

Similar problems happen when use AmpOptimizer in DETR:

  File "/home/louis/miniconda3/envs/mmengine/lib/python3.8/site-packages/mmdet/models/dense_heads/detr_head.py", line 437, in _get_targets_single
    bbox_targets[pos_inds] = pos_gt_bboxes_targets
RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source.

makecent avatar Jan 27 '24 03:01 makecent

Got same issues, have you solve it?

Cosmo1210 avatar Mar 23 '24 03:03 Cosmo1210

got same issue

black-prince222 avatar Apr 01 '24 13:04 black-prince222

Got same issues, have you solve it?

JackeyGuo avatar Jun 13 '24 12:06 JackeyGuo

Similar problems happen when use AmpOptimizerWarpper in DETR

Helen-Cheung avatar Jun 16 '24 18:06 Helen-Cheung

if anyone solve this problem?

caiduoduo12138 avatar Jul 23 '24 06:07 caiduoduo12138