mmdetection
mmdetection copied to clipboard
cuda out of memory when testing Mask2Former model
Prerequisite
- [x] I have searched Issues and Discussions but cannot get the expected help.
- [X] I have read the FAQ documentation but cannot get the expected help.
- [X] The bug has not been fixed in the latest version (master) or latest version (3.x).
💬 Describe the reimplementation questions
I have a problem when I tried to reimplement the Mask2Former model. There was no problem when training, but in val or test period, the program took up more and more gpu memory, and finally report the error: CUDA out of memory. In fact, there was no problems when I tested other models.
The whole error is as follows:
query_indices = top_indices // self.num_classes
[>>>>>>>>>>>>>>>>> ] 720/2026, 3.7 task/s, elapsed: 196s, ETA: 356sTraceback (most recent call last):
File "tools/test.py", line 321, in
Environment
sys.platform: linux Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] CUDA available: True GPU 0,1,2: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda-11.1 NVCC: Cuda compilation tools, release 11.1, V11.1.74 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.0+cu111 PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.0+cu111 OpenCV: 4.6.0 MMCV: 1.7.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.25.2+
Expected results
No response
Additional information
I didn't do any modifications. I trained on my own dataset, whose format is the same as coco, and I didn't modify anything else.
Here have some solution: https://github.com/open-mmlab/mmdetection/blob/master/docs/en/faq.md#training
Here have some solution: https://github.com/open-mmlab/mmdetection/blob/master/docs/en/faq.md#training
I met the problem when testing not training, will this work? I think the main problem is the memory was keeping getting larger when testing, which didn't happen on other models, and I don't know the reason for it.
A similar situation has also appeared in SOLO. It may be caused by some post-processing. You can try to use AvoidOOM
A similar situation has also appeared in SOLO. It may be caused by some post-processing. You can try to use
AvoidOOM
thank you for your reply. I will try it now.
A similar situation has also appeared in SOLO. It may be caused by some post-processing. You can try to use
AvoidOOMthank you for your reply. I will try it now.
Related Issue: https://github.com/open-mmlab/mmdetection/issues/6908
A similar situation has also appeared in SOLO. It may be caused by some post-processing. You can try to use
AvoidOOMthank you for your reply. I will try it now.
Related Issue: #6908 I added AvoidOOM in test.py, is it right?
outputs = AvoidCUDAOOM.retry_if_cuda_oom(single_gpu_test)(model, data_loader, args.show)
Seems It cannot using under test time, should use in a specific function, such as iou_calculator
It may be the case that some of the images in the test set are fairly large. Mask2Former resizes the predicted masks back to input image scale on GPU. This case doesn't affect training as the images are never resized back to input image scale.
It may be the case that some of the images in the test set are fairly large. Mask2Former resizes the predicted masks back to input image scale on GPU. This case doesn't affect training as the images are never resized back to input image scale.
sorry, but the images in my dataset is usually smaller than 1024*1024, which is used when training
Most probably, CUDA Out of Memory error in test images is caused when the model detects large number of objects in the image. (Count > 500 per image). So, irrespective of test image size, it will keep on throwing the same error.
So, doing mask calculations on all those detected objects with CUDA causes the CUDA to be out of memory.
-
One possible solution to handle CUDA Out of Memory is to implement AvoidCUDAOOM
-
Another Solution is to convert the CUDA tensors to CPU in the code block where the error is happening, like this:
Old Code:
Do mask calculation on GPU
New Code:
try:
Do mask calculation on GPU
except:
Do mask calculation on CPU
In my case, the error was caused due to this line running in CUDA tensors