mmdetection
mmdetection copied to clipboard
Problems of AMP Training about Co-DETR Reimplement
Notice
There are several common situations in the reimplementation issues as below
- Reimplement a model in the model zoo using the provided configs
Checklist
- I have searched related issues but cannot get the expected help.
Describe the issue
Excuese me ,does CO-DETR support AMP training? When I use AMP reimplement Co-DETR, meet the problem " RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source". After I add a type conversion. I meet the other problem "matched_row_inds, matched_col_inds = linear_sum_assignment(cost) ValueError: matrix contains invalid numeric entries" .
Reproduction
- What command or script did you run?
bash ./tools/dist_train.sh 'mmdetection/projects/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_1x_coco.py' 4 --work-dir 'mmdetection/outputs/codetr_5scale_r50_4xb4_12e_coco_results' --amp --auto-scale-lr --launcher 'pytorch'
- What config dir you run?
mmdetection/projects/CO-DETR/configs/codino/co_dino_5scale_r50_lsj_8xb2_1x_coco.py
- Did you make any modifications on the code or config? Did you understand what you have modified?
No
- What dataset did you use?
COCO
Environment
- Please run
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here. sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1,2,3: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.4, V11.4.48 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.3.2 (built against CUDA 11.5)
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.13.1 OpenCV: 4.8.1 MMEngine: 0.10.1 MMDetection: 3.2.0+fe3f809 3. You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source] conda
- Other environment variables that may be related (such as
$PATH
,$LD_LIBRARY_PATH
,$PYTHONPATH
, etc.)
Results
If applicable, paste the related results here, e.g., what you expect and what you get.
A placeholder for results comparison
Issue fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
I'm also having trouble encountering the same Issue.
Similar problems happen when use AmpOptimizer
in DETR
:
File "/home/louis/miniconda3/envs/mmengine/lib/python3.8/site-packages/mmdet/models/dense_heads/detr_head.py", line 437, in _get_targets_single
bbox_targets[pos_inds] = pos_gt_bboxes_targets
RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and Float for the source.
Got same issues, have you solve it?
got same issue
Got same issues, have you solve it?
Similar problems happen when use AmpOptimizerWarpper in DETR
if anyone solve this problem?