UniversalStyleTransfer
UniversalStyleTransfer copied to clipboard
GPU memory requirements
Hey!
I've tried running this on a single GPU with 4GB DDR, but I get:
cuda runtime error (2) : out of memory at ~/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66
Before I break open my PC to install more cards, do you have a rough estimate what the GPU memory requirements are?
Hey okdewit did you ever solve this? I'm having the same problem with 8GB of GPU memory.
@Multiboxer Well, CPU mode using "normal" RAM solved it. (-gpu -1
). You're right though, it's not really solved yet, I think some estimated system requirements in the readme could be a useful addition.
I also keep running out of normal memory when trying to render high resolution images on the CPU, but I think that has something to do with luajit limits.
@okdewit @Multiboxer
Thanks for your suggestions on estimating the memory. High-resolution is always a challenging issue in deep models.
To run my code on GPUs with small memory, you need to reduce the image size, i.e., the parameter '-contentSize' and '-styleSize' (as shown below). I test my code on a GPU with 12GB memory and the biggest size I can run is around 900.
th test_wct.lua -contentSize 256 -styleSize 256
@Yijunmaverick Good to know! Could torch/tds help with the memory limit when rendering on CPU? The speed and memory usage on a Ryzen 7 with 64GB DDR, with at sizes over 1000 is very much acceptable, but it still runs into the 32-bit Luajit limit. (https://kvitajakub.github.io/2016/03/08/luajit-memory-limitations/)
Hello, Thanks for your awesome Paper and work.
I'm just curious, is this a problem with Torch? I've tested multiple pairs of Image/styles in both Torch and Tensorflow implementations. Tensorflow literally has no problem dealing with high-resolution images (both style and content) on my GTX 1080, but Torch is unable to produce anything with contentSize above 748.
Disclaimer: I've only read the Paper, not the implementations (yet)
@taesiri Yes, the Tensorflow implementation (by Evan) did some code optimizations to reduce the memory usage. Check the second paragraph in Evan's Readme:
"As in the original paper, reconstruction decoders for layers reluX_1 (X=1,2,3,4,5) are trained separately and then hooked up in a multi-level stylization pipeline in a single graph. To reduce memory usage, a single VGG encoder is loaded up to the deepest relu layer and is shared by all decoders."
@Yijunmaverick Oh, I see. thanks for pointing that out.