neural_artistic_style Unable to produce images > 800²px on cards with more than 4GB RAM

A couple of users (here, here and here) including myself experienced the problem that they cannot create images with resolutions larger than around 800²px even though the GPU RAM should allow larger than that. I am running a script that finds the maximum possible image size that can be computed by divide and conquer. Images are scaled to a certain size and the conversion script is run against them. If it fails or succeeds, it adapts the size until a working image size is found. As the image size is reduced in every fail case, it can be ruled out that the cause of the error is a too large image size, as a size that works will always be found. I am running this on a Titan X with 12GB RAM and monitor it using nvidia-smi. What happens is that the tested images with respective sizes above 800²px do consume memory above 4GB, but even if they don't max it out, that is, reach the 12GB, they will still fail to convert unless the size is back down to what I could formerly also produce on a 4GB GPU. An exception is thrown at some point when the script is updating layer "deeppy.feedforward.activation_layers.ReLU".

Others reported to be able to create images with larger sizes (here and here). @alexjc noted that he had to "manually free some buffers" to create larger images, maybe he can give us some insights?

I am running Linux mint 17 / nvidia driver version 352.93 / cuda 7.5 / cudnn5 with cudarray master (w/ cudnn5 support, but the error occurs also with cudnn disabled) / deeppy master.

Jun 01 '16 19:06 neuralisator

One thing I've noticed when running nvidia-smi -lms 100 (100 ms update) is that you see the memory size 'peak' for a moment. Thus while your image may only be 4GB of RAM, the actual usage will peak at something much higher than that for a moment (and if in excess of your RAM, crash out).

I'll let @andersbll comment more completely, but it seems like it is probably related to how the cudarray and deeppy library handle memory. Keep in mind cudarray and deeppy are pretty much solo-developer frameworks, so probably not as well refined as torch or caffe. There is a torch implementation of style-transfer, you may want to try that as well. Torch is a much more developed deep framework, so it probably handles memory allocations better.

At the very least, give it a shot and see if it handles 800 x 800 images.

Jun 01 '16 20:06 filmo

Thanks for the recommendations, @filmo. I have actually tested a couple of different implementations, including that (neural-style adam/lbfgs, neural-art, neural-style-tf). That said, neural_artistic_style in my eyes yields the best results of them all by far, which is pretty damn impressive, given the solo developer background. The more a pity it is that it is the only tested implementation that suffers from this limit (at least in some configurations).

Also I want to say once again that I do not think this is a memory spike. As I said I am automatedly testing many different sizes of images and a 4GB card can produce the exact same size as a 12GB.

Jun 01 '16 20:06 neuralisator

To clarify, I do not completely rule out a memory spike, but it wouldn't be one on a "regular" scale. Such a massive spike that does not occur with "<= 4GB" but immediately after "> 4GB", filling up the remaining 8GB of the card, would actually be another description of the problem then.

Jun 01 '16 21:06 neuralisator

Sorry I wasn't clear on what you were asking.

I'm using a 980ti with 6GB and I'm able to apply style to images that are 1600 x 1057 and 1750 x 976 in size respectively. I think my 1600 x 1057 is about as big as I can go using this implementation.

(I'm using the cudnn4 version of cudarray and deeppy from back in February and have not yet upgraded to cudnn5 yet. Perhaps that's part of the difference. ??? I also wonder if there's something particular about the Titan?? )

I agree, I also prefer the images created by this implementation over neural_style. Not sure why there are significant difference, but there definitely are.

Jun 02 '16 02:06 filmo

Hi! Thanks for the nice writeup, I wish I could reproduce the error myself. I think you are right @neuralisator, this does not look like an out-of-memory problem. Is this the problem we are trying to solve? In that case, could you insert a print(shape) just before line 45 in cudarray/linalg.py. I would like to start out checking the arguments to cudarray.empty().

Regarding the visual style, I perform layer-level normalization of gradients. In the VGG-net, the features in each layer may exist on different scales. By L1-normalizing the gradient signals, I get a more even contribution across the different layers. I suspect this is the secret sauce. :)

Jun 02 '16 06:06 andersbll

Hello @andersbll and thanks for replying so quickly. Yes, the error you linked is the one it's about.

Before we get to debugging:

About the quality of the images: The style is just applied so much better than in any other implementation :) Plus, it's super fast. Comparison is a little tricky here of course, basically it comes down to: What do you get from the different implementations after the same amount of time has passed? For what I see, it beats the competition there as well. At this point I wouldn't understand the technical details, I can just say that this is an exceptionally great piece of work and I would love to see it work with high resolution images. Which leads me to a last question, as you said you weren't able to reproduce the problem: Can you actually produce images > 800²px (as those in this test), or are you running a 4GB card that and only that "769px" test worked for you?

Now let's debug:

I think you meant print(out_shape) ? I modified the code so it looks like this:

def dot(a, b, out=None):
    if a.ndim == b.ndim == 1:
        return inner(a, b)

    if a.dtype != b.dtype:
        raise ValueError('dtype mismatch')

    out_shape = matmul_shape(a.shape, b.shape)
    print ('out:', out, ', a.dtype: ', a.dtype, ', out_shape:', out_shape)
    if out is None:
        out = cudarray.empty(out_shape, dtype=a.dtype)
    else:

I also added a logline to show the currently processed layer (in style_network.py).

I resized the tuebingen / starry_night images to an adequate size for the testing. They can be downloaded here and here Memory consumption is in the end about 5.2GB with those 2 images (see below).

This is the output: output.txt

And these are the memory stats during the process: memory.txt

Jun 02 '16 18:06 neuralisator

Regarding speed, maybe the other implementation are using a different optimization method. The original paper uses L-BFGS as far as I remememer. It is a bit heavy compared to first-order method I use (Adam).

I have a GPU with 12 GB RAM and I can produce images larger than 800^2 pixels including the images you have attached.

Thanks for the output.txt. I can't see anything wrong there.

Can you provide me with the output of the commands ldd <path to libcudarray.so> and uname -a and python -V?

Jun 03 '16 06:06 andersbll

I hope the formatting works better this time:

$ ldd /usr/local/lib/libcudarray.so
    linux-vdso.so.1 =>  (0x00007fff3f75e000)
    libcudart.so.7.5 => /usr/local/cuda/lib64/libcudart.so.7.5 (0x00007f1464a34000)
    libcublas.so.7.5 => /usr/local/cuda/lib64/libcublas.so.7.5 (0x00007f1463154000)
    libcurand.so.7.5 => /usr/local/cuda/lib64/libcurand.so.7.5 (0x00007f145f8ec000)
    libcudnn.so.5 => /usr/local/cuda/lib64/libcudnn.so.5 (0x00007f145bda1000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f145ba73000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f145b76d000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f145b557000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f145b191000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f145af8d000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f145ad6f000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f145ab66000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1465816000)

$ uname -a
Linux base 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

(this is a Mint 17 installation)

$ python -V
Python 2.7.6

Jun 03 '16 09:06 neuralisator

Thanks, can you try to install Anaconda Python and use that instead?

Jun 03 '16 09:06 andersbll

Will do and report back, it can take a while though.

Jun 03 '16 09:06 neuralisator

Setting everything up with anaconda did the trick. It's beautiful :D Thanks @andersbll - I will give more details tomorrow.

Jun 04 '16 01:06 neuralisator

Hooray, I'm glad to hear that! What version of the package Cython is your old Python installation using? Could you try updating that to the latest and see if it helps?

Jun 04 '16 06:06 andersbll

The version that is used by the default installation is 0.24.

Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import Cython print(Cython.version) 0.24

Now here's the funny part: I have been writing an install script to do the steps that I did, for reference, and now it's broken again. I can't really make sense of it. It definitely worked, I created several high resolution images. And now I'm getting the exact same error with the anaconda installation. So I'll be busy for a bit, banging my head against the table and trying to figure out what's going on there. I'll report back.

Jun 04 '16 13:06 neuralisator

Sadly, I couldn't get it to work in a reproducible way. As I wasn't / am not deeply familiar with the python package management, different distributions and the library paths, I probably made a big mess at some point in this installation orgy and accidentally hit a working combination for a second time. In retrospective, I shouldn't have touched it anymore, but I wanted to write an install script that works reproducibly, and installing from that broke it a second time, and I am basically back to zero and get the error message every-single-time. Now I scripted this and am through installing with like every possible combination of versions/settings I can imagine.

The installation process is as follows:

install cuda toolkit (7.5) + cudnn (4 or 5) (/usr/local/cuda)
install anaconda (~/anaconda2)
in case, create a new conda environment (~/anaconda2/envs/...) and activate it
install cudarray (commit before cudnn5 merge if cudnn4, or else master)

Without creating a conda environment, just prepending anaconda2/bin to PATH, this will install cudarray with numpy 1.10-4 and cython 0.23-4. I can upgrade to numpy 1.11 and cython 0.24 using conda install, tried all combinations. I actually started with the virtual environment and installed the newer versions which didn't work. Then I used the old versions by setting PYTHONPATH and somewhere after that it worked. So i figured the older versions are working and the newer aren't. Well, there was obviously more to that. Note that I cleared out the installation directories every time, so there wouldn't be any remainders from previous tries. With a separate conda environment, I didn't have the old lib versions available unless I set PYTHONPATH=~/anaconda2/lib/python2.7/site-packages, so installing the newer packages was mandatory there, also scipy had to be installed later. With libcudarray.so installed to ~/anaconda2/lib (INSTALL_PREFIX), it wouldn't be found, unless I copied it to e.g. /usr/local/lib, so I did this after every compile. Tried both CUDNN_ENABLED=1 and =0.

Finally,

install deeppy
run test

I have even run this in nvidia-docker with both regular python and anaconda exclusively installed, and again, no luck. What is bugging me the most is that I had a working configuration 2 times, and I wrecked it again. I am probably completely missing or doing something fundamentally wrong here. As I am running out of ideas, any thoughts are appreciated.

Jun 05 '16 23:06 neuralisator

To be clear, INSTALL_PREFIX of libcudarray.so would be ~/anaconda2/ and the file would reside in ~/anaconda2/lib then.

Jun 06 '16 00:06 neuralisator

Ok, thanks for the thorough description. I might try to use some other Python installations and see if I can reproduce the error.

Just to be sure, you are using a 64-bit version of Python, right? 32-bit might be problematic above 4GB. :)

Jun 06 '16 06:06 andersbll

Yes, both stock and anaconda python2.7 are reported as 64-bit LSB executable.

Jun 06 '16 11:06 neuralisator

In the meantime I installed it on a fresh Linux Mint 17 and a Debian 7.1 installation. Anaconda variant on Mint failed. On debian I used pip to install the libraries, which for a change gave me numpy 1.11.1rc1 - it also failed. Running the anaconda version on Debian - failed.

If I didn't have the images I created back then, I'd start to think I hallucinated it ever working. How can it be it fails on every installation? How can it be it actually did work at some point? It must have used some libraries that were already present on the system or something.

This all makes no sense to me.

Jun 06 '16 17:06 neuralisator

I just ran the conversion test with cuda-memcheck, and when it crashes it puts out many of the following error messages. What differs is the thread number, so i assume every thread crashes with the same error here. Maybe that is of any help?

('layer ', <style_network.Convolution object at 0x7f7ea65dc9d0>)
========= Invalid __global__ read of size 4
=========     at 0x00001130 in void cudarray::kernel_win2img<float>(float const *, int, int, int, int, int, int, int, int, int, int, int, int, cudarray::kernel_win2img<float>*)
=========     by thread (59,0,0) in block (1242,0,0)
=========     Address 0x73fa0f748 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x15865d]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 [0x146ad]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 (cudaLaunch + 0x143) [0x2ece3]
=========     Host Frame:/home/alex/anaconda2/lib/libcudarray.so (_ZN8cudarray7win2imgIfEEvPKT_iiiiiiiiiPS1_ + 0x313) [0xb0d93]
=========     Host Frame:/home/alex/anaconda2/lib/libcudarray.so (_ZN8cudarray27conv_bc01_matmul_bprop_imgsIfEEvPKT_S3_iiiiiiiiiiiPS1_ + 0x1d7) [0x42377]
=========     Host Frame:/home/alex/anaconda2/lib/python2.7/site-packages/cudarray-0.1.dev0-py2.7-linux-x86_64.egg/cudarray/wrap/nnet.so [0xaf8e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x88e5) [0xfce15]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8525) [0xfca55]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8665) [0xfcb95]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8665) [0xfcb95]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8525) [0xfca55]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCode + 0x32) [0xfdb42]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyRun_FileExFlags + 0xb0) [0x11e050]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyRun_SimpleFileExFlags + 0xef) [0x11e22f]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (Py_Main + 0xca4) [0x133b74]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:python [0x649]
=========
========= Invalid __global__ read of size 4
=========     at 0x00001130 in void cudarray::kernel_win2img<float>(float const *, int, int, int, int, int, int, int, int, int, int, int, int, cudarray::kernel_win2img<float>*)
=========     by thread (58,0,0) in block (1242,0,0)
=========     Address 0x73fa0f744 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x15865d]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 [0x146ad]
=========     Host Frame:/usr/local/cuda/lib64/libcudart.so.7.5 (cudaLaunch + 0x143) [0x2ece3]
=========     Host Frame:/home/alex/anaconda2/lib/libcudarray.so (_ZN8cudarray7win2imgIfEEvPKT_iiiiiiiiiPS1_ + 0x313) [0xb0d93]
=========     Host Frame:/home/alex/anaconda2/lib/libcudarray.so (_ZN8cudarray27conv_bc01_matmul_bprop_imgsIfEEvPKT_S3_iiiiiiiiiiiPS1_ + 0x1d7) [0x42377]
=========     Host Frame:/home/alex/anaconda2/lib/python2.7/site-packages/cudarray-0.1.dev0-py2.7-linux-x86_64.egg/cudarray/wrap/nnet.so [0xaf8e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x88e5) [0xfce15]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8525) [0xfca55]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8665) [0xfcb95]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8665) [0xfcb95]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalFrameEx + 0x8525) [0xfca55]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCodeEx + 0x89e) [0xfda2e]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyEval_EvalCode + 0x32) [0xfdb42]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyRun_FileExFlags + 0xb0) [0x11e050]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (PyRun_SimpleFileExFlags + 0xef) [0x11e22f]
=========     Host Frame:/home/alex/anaconda2/bin/../lib/libpython2.7.so.1.0 (Py_Main + 0xca4) [0x133b74]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]
=========     Host Frame:python [0x649]

...

Jun 06 '16 17:06 neuralisator

I think I just found the solution. I didn't export the CUDNN_ENABLED=1 environment variable when installing cudarray but only ran CUDNN_ENABLED=1 make. Now I noticed that in the setup.py file it checks again for the variable. So I probably manually exported that some time inbetween. Now it's working again with the anaconda installation, didn't check anything else yet, just wanted to post that. I will not touch this now and verify it in a docker image or another system later. So basically i was too stupid to follow the installation instructions. Sorry for that. Will post more results later.

Jun 06 '16 23:06 neuralisator

Verified :D this was the problem. It also works with stock python now. However, the error still occurs if CUDNN_ENABLED=0 is set (or not set at all) during cudarray installation. So this might still be worth looking into. @andersbll let me know if you require any more info. Also thanks for your support, and keep up the good work :) I'll ping 2 more people who seemed to have the same problem: @FabienLavocat and @mirzman

Jun 07 '16 01:06 neuralisator

Ah, great job finding it finally! I admire your persistence. :)

I will try to look into kernel_win2img() at a later point. It seems like this is the problem.

Jun 07 '16 08:06 andersbll

I exported CUDNN_ENABLED=1. But the problem stays...

Jun 07 '16 11:06 mirzman

@mirzman just do make sure: export CUDNN_ENABLED=1 has to be set before compiling/installing libcudarray (for both make and setup.py). If you compiled it in the same folder before, I suggest deleting that and creating a new clone of the repo. I haven't tested if you also have to reinstall deeppy, but in case, the same might apply there. Also you have to make sure the freshly compiled libcudarray.so is actually used, and not another old version that might be lingering somewhere. cuda-memcheck shows that (see above).

Jun 07 '16 16:06 neuralisator

git clone https://github.com/andersbll/cudarray.git
git clone https://github.com/andersbll/deeppy.git
git clone https://github.com/andersbll/neural_artistic_style.git

export CUDNN_ENABLED=1

cd cudarray
make -j8 -B
sudo make install
sudo python setup.py install
cd ..

cd deeppy
sudo python setup.py install
cd ..

cd neural_artistic_style
./neural_artistic_style.py --network ~/imagenet-vgg-verydeep-19.mat --iterations 201 --subject ~/chern_s9.jpg --style images/starry_night.jpg --output o9.png
cd ..

it failes

sudo mv /usr/local/lib/libcudarray.so /usr/local/lib/libcudarray.so1

cd neural_artistic_style
./neural_artistic_style.py --network ~/imagenet-vgg-verydeep-19.mat --iterations 201 --subject ~/chern_s9.jpg --style images/starry_night.jpg --output o9.png
cd ..

sudo mv /usr/local/lib/libcudarray.so1 /usr/local/lib/libcudarray.so

it outputs "CUDArray: CUDA back-end not available, using NumPy."

Jun 07 '16 17:06 mirzman

@mirzan if I export CUDNN_ENABLED=1 here on the usual user account and then sudo, the variable is not set in that context. Please check the compiler output, if -DCUDDN_ENABLED=1 shows up there. If not, that will probably be your issue.

Jun 07 '16 17:06 neuralisator

Correction: as you don't run sudo before make, that should be fine. But you can't run setup.py with sudo like this. It's the exact same situation that I had in the end.

Jun 07 '16 17:06 neuralisator

If you install with sudo you need to export the environment variables as well. This is done with sudo -E.

Jun 07 '16 17:06 andersbll

make outputs:

g++ -DCUDNN_ENABLED -O3 -fPIC -Wall -Wfatal-errors -I./include -I/usr/local/cuda/include -c -o src/nnet/conv_bc01_matmul.o src/nnet/conv_bc01_matmul.cpp
nvcc -gencode arch=compute_20,code=sm_20 -gencode arch=compute_20,code=compute_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_30,code=compute_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -O3 --compiler-options '-DCUDNN_ENABLED -O3 -fPIC -Wall -Wfatal-errors' --ftz=true --prec-div=false -prec-sqrt=false --fmad=true -I./include -I/usr/local/cuda/include -c -o src/nnet/pool_b01.o src/nnet/pool_b01.cu
...

also:

$ echo $CUDNN_ENABLED
1
$ sudo echo $CUDNN_ENABLED
1

Jun 07 '16 17:06 mirzman

@mirzman: what is the output of ldd <path to libcudarray.so>? I think you need to update the environment variable LD_LIBRARY_PATH to point to the correct libraries.

Jun 07 '16 17:06 andersbll

neural_artistic_style neural_artistic_style copied to clipboard

Unable to produce images > 800²px on cards with more than 4GB RAM

neural_artistic_style
neural_artistic_style copied to clipboard