mmdetection cuda out of memory when testing Mask2Former model

Prerequisite

[x] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (master) or latest version (3.x).

💬 Describe the reimplementation questions

I have a problem when I tried to reimplement the Mask2Former model. There was no problem when training, but in val or test period, the program took up more and more gpu memory, and finally report the error: CUDA out of memory. In fact, there was no problems when I tested other models.

The whole error is as follows: query_indices = top_indices // self.num_classes [>>>>>>>>>>>>>>>>> ] 720/2026, 3.7 task/s, elapsed: 196s, ETA: 356sTraceback (most recent call last): File "tools/test.py", line 321, in main() File "tools/test.py", line 261, in main outputs = single_gpu_test(model, data_loader, args.show) File "/home/qwq212/mmdetection/mmdet/apis/test.py", line 29, in single_gpu_test result = model(return_loss=False, rescale=True, **data) File "/home/qwq212/anaconda3/envs/mmdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/qwq212/anaconda3/envs/mmdet/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 51, in forward return super().forward(*inputs, **kwargs) File "/home/qwq212/anaconda3/envs/mmdet/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/qwq212/anaconda3/envs/mmdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/qwq212/anaconda3/envs/mmdet/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func return old_func(*args, **kwargs) File "/home/qwq212/mmdetection/mmdet/models/detectors/base.py", line 174, in forward return self.forward_test(img, img_metas, **kwargs) File "/home/qwq212/mmdetection/mmdet/models/detectors/base.py", line 147, in forward_test return self.simple_test(imgs[0], img_metas[0], **kwargs) File "/home/qwq212/mmdetection/mmdet/models/detectors/maskformer.py", line 155, in simple_test results = self.panoptic_fusion_head.simple_test( File "/home/qwq212/mmdetection/mmdet/models/seg_heads/panoptic_fusion_heads/maskformer_fusion_head.py", line 230, in simple_test ins_results = self.instance_postprocess( File "/home/qwq212/mmdetection/mmdet/models/seg_heads/panoptic_fusion_heads/maskformer_fusion_head.py", line 153, in instance_postprocess mask_pred_binary = (mask_pred > 0).float() RuntimeError: CUDA out of memory. Tried to allocate 4.40 GiB (GPU 0; 23.70 GiB total capacity; 10.50 GiB already allocated; 4.32 GiB free; 14.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Environment

sys.platform: linux Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] CUDA available: True GPU 0,1,2: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda-11.1 NVCC: Cuda compilation tools, release 11.1, V11.1.74 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.0+cu111 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.0+cu111 OpenCV: 4.6.0 MMCV: 1.7.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.25.2+

Expected results

No response

Additional information

I didn't do any modifications. I trained on my own dataset, whose format is the same as coco, and I didn't modify anything else.

Dec 02 '22 07:12 SEU-Qwq

Here have some solution: https://github.com/open-mmlab/mmdetection/blob/master/docs/en/faq.md#training

Dec 02 '22 07:12 BIGWangYuDong

Here have some solution: https://github.com/open-mmlab/mmdetection/blob/master/docs/en/faq.md#training

I met the problem when testing not training, will this work? I think the main problem is the memory was keeping getting larger when testing, which didn't happen on other models, and I don't know the reason for it.

Dec 02 '22 07:12 SEU-Qwq

A similar situation has also appeared in SOLO. It may be caused by some post-processing. You can try to use AvoidOOM

Dec 02 '22 07:12 BIGWangYuDong

A similar situation has also appeared in SOLO. It may be caused by some post-processing. You can try to use AvoidOOM

thank you for your reply. I will try it now.

Dec 02 '22 07:12 SEU-Qwq

A similar situation has also appeared in SOLO. It may be caused by some post-processing. You can try to use AvoidOOM

thank you for your reply. I will try it now.

Related Issue: https://github.com/open-mmlab/mmdetection/issues/6908

Dec 02 '22 07:12 BIGWangYuDong

A similar situation has also appeared in SOLO. It may be caused by some post-processing. You can try to use AvoidOOM

thank you for your reply. I will try it now.

Related Issue: #6908 I added AvoidOOM in test.py, is it right? outputs = AvoidCUDAOOM.retry_if_cuda_oom(single_gpu_test)(model, data_loader, args.show)

Dec 02 '22 08:12 SEU-Qwq

Seems It cannot using under test time, should use in a specific function, such as iou_calculator

Dec 02 '22 09:12 BIGWangYuDong

It may be the case that some of the images in the test set are fairly large. Mask2Former resizes the predicted masks back to input image scale on GPU. This case doesn't affect training as the images are never resized back to input image scale.

Dec 06 '22 20:12 PeterVennerstrom

It may be the case that some of the images in the test set are fairly large. Mask2Former resizes the predicted masks back to input image scale on GPU. This case doesn't affect training as the images are never resized back to input image scale.

sorry, but the images in my dataset is usually smaller than 1024*1024, which is used when training

Dec 08 '22 09:12 SEU-Qwq

Most probably, CUDA Out of Memory error in test images is caused when the model detects large number of objects in the image. (Count > 500 per image). So, irrespective of test image size, it will keep on throwing the same error.

So, doing mask calculations on all those detected objects with CUDA causes the CUDA to be out of memory.

One possible solution to handle CUDA Out of Memory is to implement AvoidCUDAOOM
Another Solution is to convert the CUDA tensors to CPU in the code block where the error is happening, like this:

Old Code:

Do mask calculation on GPU

New Code:

try:
    Do mask calculation on GPU
except:
    Do mask calculation on CPU

In my case, the error was caused due to this line running in CUDA tensors

Jan 26 '23 10:01 pd2871

mmdetection mmdetection copied to clipboard

cuda out of memory when testing Mask2Former model

Prerequisite

💬 Describe the reimplementation questions

Environment

Expected results

Additional information

mmdetection
mmdetection copied to clipboard