GPU Out of Memory Issue
When I try to evaluate with your code, I met GPU Memory Issue. Especially, running this code
CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -n 4 python entry.py evaluate --conf_files configs/seem/focalt_unicl_lang_v1.yaml --overrides COCO.INPUT.IMAGE_SIZE 1024 MODEL.DECODER.HIDDEN_DIM 512 MODEL.ENCODER.CONVS_DIM 512 MODEL.ENCODER.MASK_DIM 512 VOC.TEST.BATCH_SIZE_TOTAL 8 TEST.BATCH_SIZE_TOTAL 8 REF.TEST.BATCH_SIZE_TOTAL 8 FP16 True WEIGHT True RESUME_FROM ./pretrained/seem_focalt_v1.pt
Could you share how much memory is needed for evaluation?
Error log:
File "/home/Segment-Everything-Everywhere-All-At-Once/entry.py", line 75, in <module>
main()
File "/home/Segment-Everything-Everywhere-All-At-Once/entry.py", line 70, in main
trainer.eval()
File "/home/Segment-Everything-Everywhere-All-At-Once/trainer/default_trainer.py", line 79, in eval
results = self._eval_on_set(self.save_folder)
File "/home/Segment-Everything-Everywhere-All-At-Once/trainer/default_trainer.py", line 87, in _eval_on_set
results = self.pipeline.evaluate_model(self, save_folder)
File "/home/Segment-Everything-Everywhere-All-At-Once/./pipeline/XDecoderPipeline.py", line 155, in evaluate_model
outputs = model(batch, mode=eval_type)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/BaseModel.py", line 19, in forward
outputs = self.model(*inputs, **kwargs)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/architectures/seem_model_v1.py", line 318, in forward
return self.evaluate(batched_inputs)
File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/architectures/seem_model_v1.py", line 387, in evaluate
outputs = self.sem_seg_head(features, target_queries=queries_grounding)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/body/xdecoder_head.py", line 99, in forward
return self.layers(features, mask, target_queries, target_vlp, task, extra)
File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/body/xdecoder_head.py", line 102, in layers
mask_features, transformer_encoder_features, multi_scale_features = self.pixel_decoder.forward_features(features)
File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/vision/encoder/transformer_encoder_fpn.py", line 293, in forward_features
cur_fpn = lateral_conv(x)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/detectron2/layers/wrappers.py", line 110, in forward
x = self.norm(x)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 279, in forward
return F.group_norm(
File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/functional.py", line 2558, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.41 GiB. GPU 0 has a total capacty of 23.64 GiB of which 386.50 MiB is free. Process 2385114 has 4.12 GiB memory in use. Process 2385112 has 17.04 GiB memory in use. Process 2385111 has 1.05 GiB memory in use. Process 2385113 has 1.05 GiB memory in use. Of the allocated memory 3.07 GiB is allocated by PyTorch, and 860.55 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I used 4 Titan RTX with 24576MiB.
@MaureenZOU @jwyang
Could you check on this issue?
I also get the error when using seem_samvitb with the same code as assets/readmes/EVAL.md.
How can I change the values to run your code without GPU memory problem? As a first step, I changed the batch size to 2, but fails.
In INSTALL.md, you mentioned
CUDA enabled GPU with Memory > 8GB (Evaluation)
but I think my setting is doing something wrong.
When I check the status, only 1 GPU is used even when I change the CUDA_VISIBLE_DEIVCES & mpi-run number. And the number of mpirun is only used for the # of concurrent tasks in one GPU.
This image shows the status when I try to evaluate with 8 GPUs.
Did you use mpi for distributed GPUs or CPUs?
same problem!!
@MaureenZOU @jwyang Could you check on this issue? I also get the error when using seem_samvitb with the same code as
assets/readmes/EVAL.md. How can I change the values to run your code without GPU memory problem? As a first step, I changed the batch size to 2, but fails.In
INSTALL.md, you mentionedCUDA enabled GPU with Memory > 8GB (Evaluation)
but I think my setting is doing something wrong. When I check the status, only 1 GPU is used even when I change the CUDA_VISIBLE_DEIVCES & mpi-run number. And the number of mpirun is only used for the # of concurrent tasks in one GPU. This image shows the status when I try to evaluate with 8 GPUs. Did you use mpi for distributed GPUs or CPUs?
Have you solved this problem?
Hi, @dongho-Han , I noticed that in your script you used TEST.BATCH_SIZE_TOTAL 8 on 4 GPUs, can you try change it to 4?
Same suggestion, evaluating multiple images on a single image will cause: 1. Inaccurate evaluation (Because of padding). 2. OOM for GPU. I usually use 1 GPU for evaluation.
Hi I was facing the same issue adding with torch.no_grad() solved the issue. You can find the gist file here
