online-neural-doodle
online-neural-doodle copied to clipboard
Out of memory error
When I execute
CUDA_VISIBLE_DEVICES=0 th feedforward_neural_doodle.lua -model_name skip_noise_4 -masks_hdf5 data/starry/gen_doodles.hdf5 -batch_size 4 -num_mask_noise_times 0 -num_noise_channels 0 -learning_rate 1e-1 -half false
I get the followingh result:
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded data/pretrained/VGG_ILSVRC_19_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
Setting up style layer 2 : relu1_1
Replacing max pooling at layer 5 with average pooling
Setting up style layer 7 : relu2_1
Replacing max pooling at layer 10 with average pooling
Setting up style layer 12 : relu3_1
Replacing max pooling at layer 19 with average pooling
Setting up style layer 21 : relu4_1
Replacing max pooling at layer 28 with average pooling
Setting up style layer 30 : relu5_1
Optimize
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-7288/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory
/home/andrew/torch/install/bin/luajit: /home/andrew/torch/install/share/lua/5.1/nn/Container.lua:67:
In 3 module of nn.Sequential:
In 1 module of nn.Sequential:
/home/andrew/torch/install/share/lua/5.1/nn/THNN.lua:109: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7288/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
[C]: in function 'v'
/home/andrew/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'SpatialReplicationPadding_updateGradInput'
...h/install/share/lua/5.1/nn/SpatialReplicationPadding.lua:41: in function 'updateGradInput'
/home/andrew/torch/install/share/lua/5.1/nn/Module.lua:31: in function </home/andrew/torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/home/andrew/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/andrew/torch/install/share/lua/5.1/nn/Sequential.lua:88: in function </home/andrew/torch/install/share/lua/5.1/nn/Sequential.lua:78>
[C]: in function 'xpcall'
/home/andrew/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/andrew/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
feedforward_neural_doodle.lua:167: in function 'opfunc'
/home/andrew/torch/install/share/lua/5.1/optim/adam.lua:33: in function 'optim_method'
feedforward_neural_doodle.lua:199: in main chunk
[C]: in function 'dofile'
...drew/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
I'm running with multiple GTX 980s, so GPU m,emory should not be an issue.
I have tried running with both -backend cudnn and -backen nn with no difference to the outcome.
I have been able to run the fast-neural-doodle project with no problems on my machine, so prerequisites such as python, torch and cuda appear to have been set up correctly.
Any idea of the cause of this problem?
Hello, I tested everything using 12GB card, so all the parameters tuned to work in my settings. You can try to decrease batch_size
to 1 to see if it not fails, but it will train much worse with this small batch size.
You can reduce the image size to decrease memory consumption. I used 512x
images, you can always go to any dimensions which are factor of 32, try 384x
for example with the same batch_size
.