Segment-Everything-Everywhere-All-At-Once GPU Out of Memory Issue

When I try to evaluate with your code, I met GPU Memory Issue. Especially, running this code

CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -n 4 python entry.py evaluate --conf_files configs/seem/focalt_unicl_lang_v1.yaml --overrides COCO.INPUT.IMAGE_SIZE 1024 MODEL.DECODER.HIDDEN_DIM 512 MODEL.ENCODER.CONVS_DIM 512 MODEL.ENCODER.MASK_DIM 512 VOC.TEST.BATCH_SIZE_TOTAL 8 TEST.BATCH_SIZE_TOTAL 8 REF.TEST.BATCH_SIZE_TOTAL 8 FP16 True WEIGHT True RESUME_FROM ./pretrained/seem_focalt_v1.pt

Could you share how much memory is needed for evaluation?

Error log:

  File "/home/Segment-Everything-Everywhere-All-At-Once/entry.py", line 75, in <module>
      main()
    File "/home/Segment-Everything-Everywhere-All-At-Once/entry.py", line 70, in main
      trainer.eval()
    File "/home/Segment-Everything-Everywhere-All-At-Once/trainer/default_trainer.py", line 79, in eval
      results = self._eval_on_set(self.save_folder)
    File "/home/Segment-Everything-Everywhere-All-At-Once/trainer/default_trainer.py", line 87, in _eval_on_set
      results = self.pipeline.evaluate_model(self, save_folder)
    File "/home/Segment-Everything-Everywhere-All-At-Once/./pipeline/XDecoderPipeline.py", line 155, in evaluate_model
      outputs = model(batch, mode=eval_type)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/BaseModel.py", line 19, in forward
      outputs = self.model(*inputs, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/architectures/seem_model_v1.py", line 318, in forward
      return self.evaluate(batched_inputs)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/architectures/seem_model_v1.py", line 387, in evaluate
      outputs = self.sem_seg_head(features, target_queries=queries_grounding)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/body/xdecoder_head.py", line 99, in forward
      return self.layers(features, mask, target_queries, target_vlp, task, extra)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/body/xdecoder_head.py", line 102, in layers
      mask_features, transformer_encoder_features, multi_scale_features = self.pixel_decoder.forward_features(features)
    File "/home/Segment-Everything-Everywhere-All-At-Once/modeling/vision/encoder/transformer_encoder_fpn.py", line 293, in forward_features
      cur_fpn = lateral_conv(x)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/detectron2/layers/wrappers.py", line 110, in forward
      x = self.norm(x)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
      return self._call_impl(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
      return forward_call(*args, **kwargs)
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 279, in forward
      return F.group_norm(
    File "/root/anaconda3/envs/seem/lib/python3.9/site-packages/torch/nn/functional.py", line 2558, in group_norm
      return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
  torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.41 GiB. GPU 0 has a total capacty of 23.64 GiB of which 386.50 MiB is free. Process 2385114 has 4.12 GiB memory in use. Process 2385112 has 17.04 GiB memory in use. Process 2385111 has 1.05 GiB memory in use. Process 2385113 has 1.05 GiB memory in use. Of the allocated memory 3.07 GiB is allocated by PyTorch, and 860.55 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I used 4 Titan RTX with 24576MiB.

Apr 02 '24 12:04 dongho-Han

@MaureenZOU @jwyang Could you check on this issue? I also get the error when using seem_samvitb with the same code as assets/readmes/EVAL.md. How can I change the values to run your code without GPU memory problem? As a first step, I changed the batch size to 2, but fails.

In INSTALL.md, you mentioned

CUDA enabled GPU with Memory > 8GB (Evaluation)

but I think my setting is doing something wrong. When I check the status, only 1 GPU is used even when I change the CUDA_VISIBLE_DEIVCES & mpi-run number. And the number of mpirun is only used for the # of concurrent tasks in one GPU. This image shows the status when I try to evaluate with 8 GPUs. Did you use mpi for distributed GPUs or CPUs? 스크린샷 2024-04-05 021752

Apr 04 '24 17:04 dongho-Han

same problem!!

May 03 '24 11:05 juju0111

@MaureenZOU @jwyang Could you check on this issue? I also get the error when using seem_samvitb with the same code as assets/readmes/EVAL.md. How can I change the values to run your code without GPU memory problem? As a first step, I changed the batch size to 2, but fails.

In INSTALL.md, you mentioned

CUDA enabled GPU with Memory > 8GB (Evaluation)

but I think my setting is doing something wrong. When I check the status, only 1 GPU is used even when I change the CUDA_VISIBLE_DEIVCES & mpi-run number. And the number of mpirun is only used for the # of concurrent tasks in one GPU. This image shows the status when I try to evaluate with 8 GPUs. Did you use mpi for distributed GPUs or CPUs?

Have you solved this problem?

May 09 '24 06:05 Beck-127

Hi, @dongho-Han , I noticed that in your script you used TEST.BATCH_SIZE_TOTAL 8 on 4 GPUs, can you try change it to 4?

May 09 '24 06:05 jwyang

Same suggestion, evaluating multiple images on a single image will cause: 1. Inaccurate evaluation (Because of padding). 2. OOM for GPU. I usually use 1 GPU for evaluation.

May 26 '24 15:05 MaureenZOU

Hi I was facing the same issue adding with torch.no_grad() solved the issue. You can find the gist file here

Sep 17 '24 13:09 tyuvraj