UniversalStyleTransfer GPU memory requirements

GPU memory requirements

Open okdewit opened this issue 6 years ago • 7 comments

Hey!

I've tried running this on a single GPU with 4GB DDR, but I get:

cuda runtime error (2) : out of memory at ~/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66

Before I break open my PC to install more cards, do you have a rough estimate what the GPU memory requirements are?

Dec 10 '17 13:12 okdewit

Hey okdewit did you ever solve this? I'm having the same problem with 8GB of GPU memory.

Dec 11 '17 08:12 Multiboxer

@Multiboxer Well, CPU mode using "normal" RAM solved it. (-gpu -1). You're right though, it's not really solved yet, I think some estimated system requirements in the readme could be a useful addition.

I also keep running out of normal memory when trying to render high resolution images on the CPU, but I think that has something to do with luajit limits.

Dec 11 '17 09:12 okdewit

@okdewit @Multiboxer

Thanks for your suggestions on estimating the memory. High-resolution is always a challenging issue in deep models.

To run my code on GPUs with small memory, you need to reduce the image size, i.e., the parameter '-contentSize' and '-styleSize' (as shown below). I test my code on a GPU with 12GB memory and the biggest size I can run is around 900.

th test_wct.lua  -contentSize 256 -styleSize 256

Dec 13 '17 08:12 Yijunmaverick

@Yijunmaverick Good to know! Could torch/tds help with the memory limit when rendering on CPU? The speed and memory usage on a Ryzen 7 with 64GB DDR, with at sizes over 1000 is very much acceptable, but it still runs into the 32-bit Luajit limit. (https://kvitajakub.github.io/2016/03/08/luajit-memory-limitations/)

Dec 13 '17 15:12 okdewit

Hello, Thanks for your awesome Paper and work.

I'm just curious, is this a problem with Torch? I've tested multiple pairs of Image/styles in both Torch and Tensorflow implementations. Tensorflow literally has no problem dealing with high-resolution images (both style and content) on my GTX 1080, but Torch is unable to produce anything with contentSize above 748.

Disclaimer: I've only read the Paper, not the implementations (yet)

Feb 08 '18 05:02 taesiri

@taesiri Yes, the Tensorflow implementation (by Evan) did some code optimizations to reduce the memory usage. Check the second paragraph in Evan's Readme:

"As in the original paper, reconstruction decoders for layers reluX_1 (X=1,2,3,4,5) are trained separately and then hooked up in a multi-level stylization pipeline in a single graph. To reduce memory usage, a single VGG encoder is loaded up to the deepest relu layer and is shared by all decoders."

Feb 08 '18 05:02 Yijunmaverick

@Yijunmaverick Oh, I see. thanks for pointing that out.

Feb 08 '18 05:02 taesiri

UniversalStyleTransfer UniversalStyleTransfer copied to clipboard

GPU memory requirements

UniversalStyleTransfer
UniversalStyleTransfer copied to clipboard