Hi. I have a question about test on Thumos 14 dataset. There's no problem training on the Thumos 14 dataset.But, test on Thumos 14 Dataset, it report an error CUDA out of memory. The code "test.py " has include the "with torch.no_grad():".Why CUDA`s memory still gradually increasing? The following is the log. Thanks.
Evaluate checkpoint: workdir/tallformer/1.0.0-vswin_b_256x256-12GB/epoch_600_weights.pth
[>>>>>>>>>>>>>>>>>>>>>>>> ] 105/212, 0.0 task/s, elapsed: 2341s, ETA: 2386sTraceback (most recent call last):
File "tools/test.py", line 151, in
if name == "main":
File "tools/test.py", line 81, in main
if not os.path.isfile(args.out):
File "tools/test.py", line 58, in test
result = engine(data)[0]
File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedacore/parallel/data_parallel.py", line 31, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/val_engine.py", line 14, in forward
return self.forward_impl(**data)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/val_engine.py", line 17, in forward_impl
dets = self.infer(imgs, video_metas)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/infer_engine.py", line 117, in infer
return self._aug_infer(imgs, video_metas)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/infer_engine.py", line 83, in _aug_infer
tdets = self._get_raw_dets(imgs, video_metas)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/infer_engine.py", line 36, in _get_raw_dets
feats = self.extract_feats(imgs)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/engines/infer_engine.py", line 24, in extract_feats
feats = self.model(img, train=False)
File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/detectors/mem_single_stage_detector.py", line 89, in forward
feats = self.forward_eval(x)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/detectors/mem_single_stage_detector.py", line 74, in forward_eval
feats = self.backbone(x)
File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, **kwargs)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/chunk_model.py", line 51, in forward
return forward_x(x)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/chunk_model.py", line 46, in forward_x
return self.forward_nochunk_inp_output(x)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/chunk_model.py", line 112, in forward_nochunk_inp_output
x = super().forward(x) # shape: [n, c, d, h, w]
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 819, in forward
x = layer(x.contiguous())
File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, **kwargs)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 532, in forward
x = blk(x, attn_mask)
File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, **kwargs)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 371, in forward
x = self.forward_part1(x, mask_matrix, self.dummy_tensor)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 334, in forward_part1
attn_windows = self.attn(x_windows, mask=attn_mask) # BnW, WdWhWw, C
File "/home/ubuntu/users/caoqiushi/anaconda3/envs/vedatad1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/users/caoqiushi/TALLFormer-main/vedatad/models/backbones/vswin.py", line 210, in forward
attn = attn + relative_position_bias.unsqueeze(0) # B, nH, N, N
RuntimeError: CUDA out of memory. Tried to allocate 4.40 GiB (GPU 0; 31.75 GiB total capacity; 20.25 GiB already allocated; 2.95 GiB free; 20.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
If I'm not mistaken, it's because when training the model it will employ the video's memory bank to alleviate workload on the GPU, while when testing it will load the entire video frames, thus GPU memory increases a lot!
If I'm not mistaken, it's because when training the model it will employ the video's memory bank to alleviate workload on the GPU, while when testing it will load the entire video frames, thus GPU memory increases a lot!
Thanks for your answer. But how did you solve this problem? My GPU is NVIDIA Tesla V100 SXM2. I think it should have been enough to complete the test.
Hi @caoqiushi , you could try to decrease the batch size for inference, i.e. set samples_per_gpu=1 during inference. The default inference batch size is set to 4.
Thank you for your answer. I will try it.
Hi @caoqiushi , you could try to decrease the batch size for inference, i.e. set samples_per_gpu=1 during inference. The default inference batch size is set to 4.