online-neural-doodle Training a model fails

Hi, I tried to run the command from the tutorial for model training, but it failed with the following error:

 CUDA_VISIBLE_DEVICES=0 th feedforward_neural_doodle.lua -model_name skip_noise_4 -masks_hdf5 data/starry/gen_doodles.hdf5 -batch_size 4 -num_mask_noise_times 0 -num_noise_channels 0 -learning_rate 1e-1 -half false
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/hdf5/group.lua:312: HDF5Group:read() - no such child 'style_img' for [HDF5Group 33554432 /]
stack traceback:
    [C]: in function 'error'
    /root/torch/install/share/lua/5.1/hdf5/group.lua:312: in function 'read'
    feedforward_neural_doodle.lua:49: in main chunk
    [C]: in function 'dofile'
    /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

any ideas why hdf5 might fail with such error?

Jul 15 '16 20:07 randomrandom

did you generate hdf5 file first?

Jul 16 '16 08:07 DmitryUlyanov

yes, initially I thought that something with the generation didn't go good - since this script never completed:

python generate.py --n_jobs 30 --n_colors 4 --style_image data/starry/style.png --style_mask data/starry/style_mask.png --out_hdf5 data/starry/gen_doodles.hdf5 even though a new hdf5 file was generated

So I decided to try the sample command that you have put in the README - so it should use the sample hdf5 file from the repo, unfortunately it made no difference.

Is it possible that the two fail due to bad hdf5 setup?

Jul 16 '16 11:07 randomrandom

there's no sample hdf5 file, since it is too large. You should let it work till it finishes.

Jul 16 '16 17:07 DmitryUlyanov

thanks, I'll try that! How much time does it take on ur setup?

Do you advise to increase the jobs? I'm using a Tesla K10 setup

Jul 16 '16 17:07 randomrandom

I managed to get it working, unfortunately it looks like the VRAM (3.5GB) is not enough. What's the best way to reduce the memory footprint?

p.s.: I'm familiar with Johnson's implementation and know what I can do there, but I still haven't read your blogpost and the code documentation :(

Edit 1: From first glance - looks like reducing the batch_size and n_colors might do the trick? I increased them to 8, maybe that's why it fails..

Edit 2: Is it even possible to squeeze the training into 3.5GB? I started going through the code and I noticed that you are already doing a lot of the memory optimizations (e.g. using cudnn and the ADAM optimizer)..

Jul 16 '16 18:07 randomrandom

Try doing batch_size = 1, do not change ncolors, you can also downsize the image to 256x256 for example

Jul 16 '16 21:07 DmitryUlyanov

looks like batch_size=1 did the trick, I previously tried with 2 and 3 with no success. Does this affect the quality or just the speed of the training?

Jul 17 '16 06:07 randomrandom

The quality will be ok, I used batch_size = 1, but at test time you need to experiment with midel:evaluate() or model:training()

Jul 17 '16 08:07 DmitryUlyanov

BTW, do you recommend this repo for artistic neural transfer? probably to do it well there should be some semantic analysis that determines the masks :? Is there any other approach that you can recommend

Jul 18 '16 17:07 randomrandom

online-neural-doodle online-neural-doodle copied to clipboard

Training a model fails

online-neural-doodle
online-neural-doodle copied to clipboard