nnabla-examples icon indicating copy to clipboard operation
nnabla-examples copied to clipboard

OOM for TECO GAN

Open stalagmite7 opened this issue 3 years ago • 9 comments

Seems like using even a height of 360 (whicle maintaining aspect ratio) for tecogan gives runtime OOM errors; whats the largest size possible that I can use to try to upscale to 4k? I imagine if I want to upscale to 4k, I would use 1080p as the resolution for my input but its too big for the GPU to handle; if there a way to use only CPU for this?

stalagmite7 avatar Jun 24 '21 05:06 stalagmite7

Thanks for reporting.

It's probably because the clear_buffer option in forward() method is not specified in the following code block. https://github.com/sony/nnabla-examples/blob/master/GANs/tecogan/generate.py#L83-L85

With .forward(clear_buffer=True), it will aggressively release unused memory in the network.

Could you try this quickly?

            pre_gen_warp.forward(clear_buffer=True)
            pre_warp.data.copy_from(pre_gen_warp.data)
        outputs.forward(clear_buffer=True)

We'll also see if it works properly and reduces memory usage later soon.

TakuyaNarihira avatar Jun 24 '21 07:06 TakuyaNarihira

Thanks for the quick response! I just got AFK, I’ll try it in a few hours and keep you posted!

stalagmite7 avatar Jun 24 '21 07:06 stalagmite7

Tried this, got a invalid configuration error from CUDA

Error during forward propagation:
  TransposeCuda <-- ERROR
Traceback (most recent call last):
  File "generate.py", line 105, in <module>
    main()
  File "generate.py", line 84, in main
    pre_gen_warp.forward(clear_buffer=True)
  File "_variable.pyx", line 564, in nnabla._variable.Variable.forward
RuntimeError: target_specific error in forward_impl
/home/gitlab-runner/builds/zxvvzZDJ/0/nnabla/builders/all/nnabla-ext-cuda/src/nbla/cuda/function/./generic/transpose.cu:184
(cudaGetLastError()) failed with "invalid configuration argument" (cudaErrorInvalidConfiguration).

Cursory checking looks like it could be a number of blocks error from CUDA. Will need to dig in further on my end later today.

stalagmite7 avatar Jun 24 '21 16:06 stalagmite7

Looks it exceeds the limitation of the number of blocks. We should introduce the grid-strided loop in CUDA kernel. I created a issue in sony/nnabla-ext-cuda#321 (Let's continue there on this specific matter).

Btw, how long is your input video sequence?

TakuyaNarihira avatar Jun 25 '21 23:06 TakuyaNarihira

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

stalagmite7 avatar Jan 31 '22 06:01 stalagmite7

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

@stalagmite7, is it possible to share more information about computation environment?

Srinidhi-Srinivasa avatar Feb 01 '22 11:02 Srinidhi-Srinivasa

Checking back in, I know it says the fix has been deployed but the OOM error persists. Like I asked before, what is the maximum size possible that I can upscale a video to? I am trying 1080 -> 4k but I get the OOM errors. Seems to work for smaller video sizes, so does that mean 1080p cases won't be handled by this implementation?

@stalagmite7 Following are approximate memory requirements to run TeCoGAN:

Resolution Peak Memory Usage (in MB)
144p 708
280p 2816
360p 4074
480p 6818

Please note that it may not be possible to run TeCoGAN on any resolution higher than this on GPUs which have upto 32 GB of memory.

Current pre-trained weights are in NHWC (channel last) format which is not supported in CPU version. However, it is indeed possible to run inference on CPU-only by transposing weights into NCHW format and setting "channel_last" flag to "False" in PF.conv functions. Following are reference codes for that: Memory-Layout-Conversion convert_parameter_format.py

Srinidhi-Srinivasa avatar Feb 03 '22 08:02 Srinidhi-Srinivasa

Sorry it took me so long; the GPU is a Nvidia 3060 Ti . The input video as I mentioned was 1080p resolution; you're saying this is too high to get TecoGan to try to process, then?

stalagmite7 avatar Feb 03 '22 16:02 stalagmite7

Sorry it took me so long; the GPU is a Nvidia 3060 Ti . The input video as I mentioned was 1080p resolution; you're saying this is too high to get TecoGan to try to process, then?

Yes.

Srinidhi-Srinivasa avatar Feb 03 '22 16:02 Srinidhi-Srinivasa