mmagic icon indicating copy to clipboard operation
mmagic copied to clipboard

torch.cuda.OutOfMemoryError: CUDA out of memory.real_basicvsr/basicvsr++运行报错

Open txy00001 opened this issue 1 year ago • 2 comments

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmagic

Environment

image cuda 11.7 cudnn8.9 gpu:4090

Reproduces the problem - code sample

import os import time from mmagic.apis import MMagicInferencer from mmengine import mkdir_or_exist

# Create a MMagicInferencer instance and infer

video = '/home/txy/code/blur/video/6.mp4' result_out_dir = '/home/txy/code/blur/output/6.mp4' mkdir_or_exist(os.path.dirname(result_out_dir)) beg=time.time() editor = MMagicInferencer('real_basicvsr', device='cuda:1') results = editor.infer(video=video, result_out_dir=result_out_dir) time.time-beg print(time.time-beg)

Reproduces the problem - command or script

import os import time from mmagic.apis import MMagicInferencer from mmengine import mkdir_or_exist

# Create a MMagicInferencer instance and infer

video = '/home/txy/code/blur/video/6.mp4' result_out_dir = '/home/txy/code/blur/output/6.mp4' mkdir_or_exist(os.path.dirname(result_out_dir)) beg=time.time() editor = MMagicInferencer('real_basicvsr', device='cuda:1') results = editor.infer(video=video, result_out_dir=result_out_dir) time.time-beg print(time.time-beg)

Reproduces the problem - error message

/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG19_Weights.IMAGENET1K_V1. You can also use weights=VGG19_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) Loads checkpoint by http backend from path: https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth The model and loaded state dict do not match exactly

unexpected key in source state_dict: step_counter

01/25 17:15:56 - mmengine - WARNING - Failed to search registry with scope "mmagic" in the "function" registry tree. As a workaround, the current "function" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmagic" is a correct scope, or whether the registry is initialized. /home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the save_dir argument. warnings.warn(f'Failed to add {vis_backend.class}, ' Traceback (most recent call last): File "/home/txy/code/blur/demo/bas_real/real_infer_video.py", line 12, in results = editor.infer(video=video, result_out_dir=result_out_dir) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/mmagic_inferencer.py", line 231, in infer return self.inferencer( File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/init.py", line 110, in call return self.inferencer(**kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py", line 139, in call results = self.base_call(**kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py", line 165, in base_call preds = self.forward(data, **forward_kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/apis/inferencers/video_restoration_inferencer.py", line 127, in forward result = self.model( File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/base_models/base_edit_model.py", line 109, in forward return self.forward_tensor(inputs, data_samples, **kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/real_esrgan/real_esrgan.py", line 112, in forward_tensor feats = self.generator_ema(inputs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/real_basicvsr/real_basicvsr_net.py", line 88, in forward residues = self.image_cleaning(lqs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/mmagic/models/editors/basicvsr/basicvsr_net.py", line 214, in forward return self.main(feat) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward input = module(input) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/txy/anaconda3/envs/mmpose/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 59.33 GiB (GPU 1; 23.65 GiB total capacity; 2.92 GiB already allocated; 19.91 GiB free; 2.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additional information

我想对视频进行超分,视频是1s,和5s的,分辨率是1280×720,3160×2160,尝试real_basicvsr和basicvsr++都是 image 这个错误,换了服务器和视频依旧有问题,我该如何解决?

txy00001 avatar Jan 25 '24 09:01 txy00001

路过回复下(非官方),我猜测是传入模型的帧数太多了,比如你说的1s视频,1280x720,如果是30帧,需要的显存也不少了,应该在几十G的量级。 所以你需要确定:

  1. editor.infer传入video, 是解码所有帧然后将所有帧调用模型;还是使用max_seq_len参数来窗口调用;具体逻辑见代码:https://github.com/open-mmlab/mmagic/blob/main/mmagic/apis/inferencers/video_restoration_inferencer.py#L126
  2. 如果1.是将所有帧传入,那么就是因为帧数过多导致oom;如果1是已经只传部分帧,那么在你这个分辨率下,需要将部分帧减少(max_seq_len参数),来避免oom 至于如何修改max_seq_len参数,你需要找下怎么对inferencer传参,整体的模型参数逻辑在 https://github.com/open-mmlab/mmagic/blob/main/mmagic/apis/mmagic_inferencer.py#L150,看起来是会读取一个默认的配置文件

Feynman1999 avatar Jan 26 '24 04:01 Feynman1999

路过回复下(非官方),我猜测是传入模型的帧数太多了,比如你说的1s视频,1280x720,如果是30帧,需要的显存也不少了,应该在几十G的量级。 所以你需要确定:

  1. editor.infer传入video, 是解码所有帧然后将所有帧调用模型;还是使用max_seq_len参数来窗口调用;具体逻辑见代码:https://github.com/open-mmlab/mmagic/blob/main/mmagic/apis/inferencers/video_restoration_inferencer.py#L126
  2. 如果1.是将所有帧传入,那么就是因为帧数过多导致oom;如果1是已经只传部分帧,那么在你这个分辨率下,需要将部分帧减少(max_seq_len参数),来避免oom 至于如何修改max_seq_len参数,你需要找下怎么对inferencer传参,整体的模型参数逻辑在 https://github.com/open-mmlab/mmagic/blob/main/mmagic/apis/mmagic_inferencer.py#L150,看起来是会读取一个默认的配置文件

感谢,我试一下

txy00001 avatar Jan 26 '24 07:01 txy00001