neural-style Where should I start if I want to train a model for usage with Neural-Style?

trafficstars

Where should I start if I want to train a model for usage with Neural-Style?

Are Network In Network (NIN) models easier to train than VGG models?

Does anyone know of any guides that cover training a model that is compatible with Neural-Style from start to finish? If not, then what do I need to look for in order to make sure the model I am learning to train is compatible with Neural-Style?

What is the easiest way to train a model for use with neural-style? Are there any AMIs available that will let me start messing around with training right away?

Jul 24 '16 21:07 ProGamerGov

There are at least two parts to this question:

training a model that works technically in neural-style
creating models which produce adequate quality images using neural-style

One has to start from the technical part. Caffe http://caffe.berkeleyvision.org is a good choice to start with. It is not too difficult to install, no coding is needed to use it and it directly produces caffemodel files. To train a model, one needs

the training and testing datasets, in practice images and a label for each image; these are then converted into a LMDB database
a training prototxt file describing the architecture of the model etc.
a solver configuration file

With these in place, training using caffe will create a model initialized with random weights (according to what is stated in the prototxt file) and start training it using the dataset.

Training a deep network from scratch can be difficult and time-consuming. One might start with a small model first, with only a limited number of convolutional layers, or one might try finetuning an existing model. Finetuning means taking an existing, already trained model and training it further using a different dataset. Like in this example http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html .

Either way, one can without much difficulty create models that work with neural-style, in the sense that the model loads, iterations start and even the losses may start diminishing. The visual results are often a disappointment, however. I have done this several times already, using wikiart, my own photo library and a programmatically created dataset of geometrical images. Nothing really useful yet, but learning all the time.

Some more detailed notes: For VGG networks, it looks like that training prototxt files are not available in the web, but I managed to piece together one that works. Training a VGG network from scratch is not really recommended. From what I have heard, the creators of the model couldn't train the deeper models from scratch, but had to train smaller models first and then add layers for a new training round. But maybe a VGG with only 1st and 2nd conv layers levels as a first try. Or a VGG finetuned on one's own dataset.

Jul 25 '16 06:07 htoyryla

I sucessfully trained a model that is similar to NIN but with less layers and produced the following images after training it for 70,000 iterations:

https://imgur.com/a/sYRhV

I used the CIFAR10 data set and this github page along with the supplied scripts in home/ubuntu/caffe/examples/cifar10.

https://gist.github.com/mavenlin/d802a5849de39225bcc6

I am currently wondering if there is a data set of artwork available at the moment that I could use for training?

I found this data set: http://people.bath.ac.uk/hc551/dataset.html but that's it from what I have been able to find thus far for artwork data sets. I was also considering grabbing all the images posted to /r/art/ on Reddit for use in training. Maybe also using my massive collection of styles as well.

Jul 25 '16 06:07 ProGamerGov

Your results look familiar to me. They can be interesting as such, but if the model does not respond to the different styles, then it is very limited what it can achieve.

I cannot now locate the example from where I obtained the wikiart materials. It was not a caffe example if I remember correctly. More like someone's python project, from which I got a list of wikiart urls with label data. Not all urls worked, but out of those which did I put together an LMDB. I'll look further if I find something.

Jul 25 '16 06:07 htoyryla

Here's one of my results:

sh3-i19800-paasikivi-feininger-cl23sl124-cw200sw100_150

Only the colors derive from the style. Changing layers, weights and style image produces a number of variation, but quite limited.

sh3-i12000-paasikivi-kahvila-cl234sl124-cw200sw40000_150

sh3-i12000-paasikivi-feininger-cl234sl124-cw200sw40000_150

Another model I trained produced mainly clouds or blobs of color:

sibir-sh86000g_310

It seems to me that these limitations derive from a too small dataset and too few training iterations. One needs also to consider the contents of the dataset. Even if the training is successful, the model only learns to recognize such features that stand out in the dataset. To work well, it should recognize the features that are essential in both content and style images. My geometrical shapes dataset resulted in clouds of color, then clearly the model failed to recognize essential features in the images.

I have not used CIFAR10, but I assume that the small size of the images might be a handicap. In another thread here, a hypothesis was raised that a model in neural style works best with images of the size of the training images.

Roaming a bit further, I have recently been interested in unsupervised training, using a model which first crunches the image into a vector (such as FC6 output) and then reconstructs the image using deconvolutional and unpooling layers. With this approach, we don't need labels, as the model will learn by comparing the input and output images.

Jul 25 '16 07:07 htoyryla

The material about finetuning using wikiart can be found here https://computing.ece.vt.edu/~f15ece6504/homework2/ . I see it mainly useful for the image urls and labels, as a basis for making LMDB for caffe. And for neural-style, forget Alexnet, it requires GROUP which is not supported by loadcaffe.

Jul 25 '16 07:07 htoyryla

For anyone who is interested, here's one of my VGG16 train prototxt files. Some configuration will be needed if you want to use it.

name: "VGG_hplaces_16_layers"
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {
    source: "/home/hannu/caffe/hplaces/hplaces_train_lmdb"
    backend: LMDB
    batch_size: 28
  }
  transform_param {
    crop_size: 224
    #mirror: true
    mean_file: "/home/hannu/caffe/hplaces/hplaces_train_mean.binaryproto"
  }
  include: { phase: TRAIN }
}
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {

    source: "/home/hannu/caffe/hplaces/hplaces_val_lmdb/"
    backend: LMDB
    batch_size: 10
  }
  transform_param {
    crop_size: 224
    #mirror: false
    mean_file: "/home/hannu/caffe/hplaces/hplaces_val_mean.binaryproto"
  }
  include: { phase: TEST }
}
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv1_1"
  top: "conv1_1"
  name: "relu1_1"
  type: RELU
}
layers {
  bottom: "conv1_1"
  top: "conv1_2"
  name: "conv1_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv1_2"
  top: "conv1_2"
  name: "relu1_2"
  type: RELU
}
layers {
  bottom: "conv1_2"
  top: "pool1"
  name: "pool1"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool1"
  top: "conv2_1"
  name: "conv2_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv2_1"
  top: "conv2_1"
  name: "relu2_1"
  type: RELU
}
layers {
  bottom: "conv2_1"
  top: "conv2_2"
  name: "conv2_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv2_2"
  top: "conv2_2"
  name: "relu2_2"
  type: RELU
}
layers {
  bottom: "conv2_2"
  top: "pool2"
  name: "pool2"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool2"
  top: "conv3_1"
  name: "conv3_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_1"
  top: "conv3_1"
  name: "relu3_1"
  type: RELU
}
layers {
  bottom: "conv3_1"
  top: "conv3_2"
  name: "conv3_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_2"
  top: "conv3_2"
  name: "relu3_2"
  type: RELU
}
layers {
  bottom: "conv3_2"
  top: "conv3_3"
  name: "conv3_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv3_3"
  top: "conv3_3"
  name: "relu3_3"
  type: RELU
}
layers {
  bottom: "conv3_3"
  top: "pool3"
  name: "pool3"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool3"
  top: "conv4_1"
  name: "conv4_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_1"
  top: "conv4_1"
  name: "relu4_1"
  type: RELU
}
layers {
  bottom: "conv4_1"
  top: "conv4_2"
  name: "conv4_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_2"
  top: "conv4_2"
  name: "relu4_2"
  type: RELU
}
layers {
  bottom: "conv4_2"
  top: "conv4_3"
  name: "conv4_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv4_3"
  top: "conv4_3"
  name: "relu4_3"
  type: RELU
}
layers {
  bottom: "conv4_3"
  top: "pool4"
  name: "pool4"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool4"
  top: "conv5_1"
  name: "conv5_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_1"
  top: "conv5_1"
  name: "relu5_1"
  type: RELU
}
layers {
  bottom: "conv5_1"
  top: "conv5_2"
  name: "conv5_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_2"
  top: "conv5_2"
  name: "relu5_2"
  type: RELU
}
layers {
  bottom: "conv5_2"
  top: "conv5_3"
  name: "conv5_3"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "conv5_3"
  top: "conv5_3"
  name: "relu5_3"
  type: RELU
}
layers {
  bottom: "conv5_3"
  top: "pool5"
  name: "pool5"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  name: "fc6"
  type: INNER_PRODUCT
  bottom: "pool5"
  top: "fc6"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layers {
  name: "relu6"
  type: RELU
  bottom: "fc6"
  top: "fc6"
}
layers {
  name: "drop6"
  type: DROPOUT
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layers {
  name: "fc7"
  type: INNER_PRODUCT
  bottom: "fc6"
  top: "fc7"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layers {
  name: "relu7"
  type: RELU
  bottom: "fc7"
  top: "fc7"
}
layers {
  name: "drop7"
  type: DROPOUT
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}

layers {
  bottom: "fc7"
  top: "fc8_places"
  name: "fc8_places"
  type: INNER_PRODUCT
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 205
    weight_filler {
      type: "gaussian"
      mean: 0
      std: 0.05
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  bottom: "fc8_places"
  top: "prob"
  name: "prob"
  type: SOFTMAX
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "fc8_places"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  bottom: "fc8_places"
  bottom: "label"
  name: "loss"
  type: SOFTMAX_LOSS
  include: { phase: TRAIN }
}

You need to change the pointers to your dataset and mean files, as well as the batch sizes maybe. You may also want to comment out the prob layer to have cleaner output using training.

Jul 25 '16 07:07 htoyryla

If you want a big image set for training, you can download the imagenet database. It is what was used to train the default vgg-19 model.

http://image-net.org

Jul 25 '16 07:07 3DTOPO

Imagenet is certainly a good choice if one wants to train with a general image set and has the computing platform for large scale training. I am planning to get another linux machine dedicated for training but for the moment I cannot tie up my linux computer long enough for other than small experiments (which are good for learning anyway).

Jul 25 '16 08:07 htoyryla

@htoyryla As far as I understand, fine-tuning an already trained model means that you can use a smaller data set.

So I have this data set here with art images:

I just posted a few examples but every category seems to have between 50 and 80 images. People-Art has multiple areas such as Annotations and JPEG images where as Photo-Art does not. Would the wiki-art data set be better or would the People-Art/Photo-Art-50 data set be better for training?


People-Art: 

People-Art\Annotations\Academicism\albert-anker_b-ckligumpen-1866.jpg.xml
People-Art\Annotations\Academicism\albert-joseph-moore_amber.jpg.xml

People-Art\JPEGImages\Academicism\albert-anker_b-ckligumpen-1866.jpg
People-Art\JPEGImages\Academicism\albert-joseph-moore_amber.jpg

People-Art\matlab_funcs\demo_show_anno.m
People-Art\matlab_funcs\VOCevaldet_cai.m

People-Art\test.txt
People-Art\train.txt
People-Art\trainval.txt
People-Art\trainval_only_fg_ims.txt
People-Art\val.txt




Photo-Art-50:

Photo-Art-50\016.boom-box\016a_0001.jpg
Photo-Art-50\101.head-phones\101a_0001.jpg
Photo-Art-50\101.head-phones\101a_0002.jpg

And this previously fine tuned model here that already produces good images in neural-style:

https://gist.github.com/jimmie33/509111f8a00a9ece2c3d5dde6a750129#file-readme-md

How would I step by step, convert this data set into the lmdb files and then how would I exactly use your prototxt to train the already made caffemodel? What train.prototxt and solver.txt files do I need and which ones do I modify? What modifications do I make? I have tried modifying ones that were unclear based on the naming, which file I should to replace it. I tried making a NIN model like the one in Neural-Style using the CIFAR10 data set, but it had the exact same amount of layers that my previous CIFAR10 model had and not the same layers as Neural-Style's NIN model has.

I found this fine tuning command on the Berkeley site:

./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models//bvlc_reference_caffenet.caffemodel -gpu 0

I can easily modify the paths and filenames, but is it the right command to use?

With the wiki-art data set, how exactly do I convert it to the lmdb files that I need? This lmdb part is probably the most confusing part of neural networks for me because I have not found any guides that let me make sense of what exactly I have to do.

And @htoyryla , if possible, could you post the lmdb files and mean files you made from the wiki-art data set for me to download?

Jul 25 '16 21:07 ProGamerGov

So I tried to fine-tune the VGG16 SOD model on the CIFAR10 data set, and received the following error:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0

I0726 00:44:44.228581  1820 layer_factory.hpp:74] Creating layer data
I0726 00:44:44.228623  1820 net.cpp:84] Creating Layer data
I0726 00:44:44.228648  1820 net.cpp:338] data -> data
I0726 00:44:44.228682  1820 net.cpp:338] data -> label
I0726 00:44:44.228709  1820 net.cpp:113] Setting up data
I0726 00:44:44.228801  1820 db.cpp:34] Opened lmdb /home/ubuntu/caffe/examples/cifar10/cifar10_train_lmdb
I0726 00:44:44.228873  1820 data_layer.cpp:67] output data size: 28,3,224,224
I0726 00:44:44.228899  1820 data_transformer.cpp:22] Loading mean file from: /home/ubuntu/caffe/data/cifar10/cifar10_train_mean.binaryproto
I0726 00:44:44.234645  1820 net.cpp:120] Top shape: 28 3 224 224 (4214784)
I0726 00:44:44.234693  1820 net.cpp:120] Top shape: 28 (28)
I0726 00:44:44.234710  1820 layer_factory.hpp:74] Creating layer conv1_1
I0726 00:44:44.234742  1820 net.cpp:84] Creating Layer conv1_1
I0726 00:44:44.234756  1820 net.cpp:380] conv1_1 <- data
I0726 00:44:44.234807  1820 net.cpp:338] conv1_1 -> conv1_1
I0726 00:44:44.234838  1820 net.cpp:113] Setting up conv1_1
F0726 00:44:44.241438  1825 data_transformer.cpp:138] Check failed: height <= datum_height (224 vs. 32)
*** Check failure stack trace: ***
    @     0x7f38355c4daa  (unknown)
    @     0x7f38355c4ce4  (unknown)
    @     0x7f38355c46e6  (unknown)
    @     0x7f38355c7687  (unknown)
    @     0x7f38359303c1  caffe::DataTransformer<>::Transform()
    @     0x7f38359eb4f8  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f382d2e5a4a  (unknown)
    @     0x7f382b73c182  start_thread
    @     0x7f3834baf47d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

I was also using this solver.prototxt: https://github.com/ruimashita/caffe-train/blob/master/vgg.solver.prototxt and htoyryla's train_val.prototxt

Same error on the normal VGG-16 model:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16/solver.prototxt -weights models/vgg16/VGG_ILSVRC_16_layers.caffemodel -gpu 0

layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss/loss"
}
I0726 00:55:56.447276  1872 layer_factory.hpp:74] Creating layer data
I0726 00:55:56.447317  1872 net.cpp:84] Creating Layer data
I0726 00:55:56.447342  1872 net.cpp:338] data -> data
I0726 00:55:56.447377  1872 net.cpp:338] data -> label
I0726 00:55:56.447404  1872 net.cpp:113] Setting up data
I0726 00:55:56.447495  1872 db.cpp:34] Opened lmdb /home/ubuntu/caffe/examples/cifar10/cifar10_train_lmdb
I0726 00:55:56.447563  1872 data_layer.cpp:67] output data size: 64,3,224,224
I0726 00:55:56.458580  1872 net.cpp:120] Top shape: 64 3 224 224 (9633792)
I0726 00:55:56.458628  1872 net.cpp:120] Top shape: 64 (64)
I0726 00:55:56.458647  1872 layer_factory.hpp:74] Creating layer conv1_1
I0726 00:55:56.458678  1872 net.cpp:84] Creating Layer conv1_1
I0726 00:55:56.458693  1872 net.cpp:380] conv1_1 <- data
I0726 00:55:56.458720  1872 net.cpp:338] conv1_1 -> conv1_1
I0726 00:55:56.458788  1872 net.cpp:113] Setting up conv1_1
F0726 00:55:56.465386  1877 data_transformer.cpp:138] Check failed: height <= datum_height (224 vs. 32)
*** Check failure stack trace: ***
    @     0x7f22574a2daa  (unknown)
    @     0x7f22574a2ce4  (unknown)
    @     0x7f22574a26e6  (unknown)
    @     0x7f22574a5687  (unknown)
    @     0x7f225780e3c1  caffe::DataTransformer<>::Transform()
    @     0x7f22578c94f8  caffe::DataLayer<>::InternalThreadEntry()
    @     0x7f224f1c3a4a  (unknown)
    @     0x7f224d61a182  start_thread
    @     0x7f2256a8d47d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

Jul 26 '16 00:07 ProGamerGov

I took the Cubo-Futurism jpg files from the people art data set. I then tried and failed to successfully create the val and train lmdb files.

Jul 26 '16 02:07 ProGamerGov

You get the error because my training VGG16 prototxt (and any imagenet based prototxt) expects 256x256 images (then cropped accroding to the prototxt to 224x224) and CIFAR is 32x32.

Check failed: height <= datum_height (224 vs. 32)

I can help with LMDB and prototxt but for a few days I am terribly busy with other things and mostly not even near a computer.

LMDB is created using a script like in caffe/examples/imagenet/create_imagenet.sh, but the script usually needs to be adjusted for paths etc. It can take some time to get used to it and get everything to match, so that the script finds the train.txt and val.txt files as well as the images referred to in them, the image sizes are correct, then it creates two LMDB files. Then you calculate the mean images based on the LMDBs using caffe/examples/imagenet/make_imagenet_mean.sh (or something like that). Then modify the training prototxt to point to your LMDBs and binaryproto files. And make sure the solver.prototxt points to the correct training prototxt.

The train.txt and val.txt for the LMDB creation contain lines like

path_to_an_image label

where label is an integer from 0 .. number_of_categories-1

The handling of paths can be a bit tricky. They are relative to paths set in create_imagenet.sh, but it took me some time to get the paths right.

This is all I can contribute right now. After a few days I will have better time to respond. I am not sure if I have my wikiart LMDB any more, I have other LMDBs but they are usually quite large files.

PS. See also the caffe imagenet example for the LMDB part (never mind if the page talks about leveldb instead of lmdb, it is an alternative option). http://caffe.berkeleyvision.org/gathered/examples/imagenet.html You might also try the example as such, then the paths should match readily.

Jul 26 '16 04:07 htoyryla

So I have my images at:

/home/ubuntu/caffe/data/People-Art/JPEGImages/Academicism
/home/ubuntu/caffe/data/People-Art/JPEGImages/AnalyticalRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/ArtDeco
/home/ubuntu/caffe/data/People-Art/JPEGImages/ArtNouveau(Modern)
/home/ubuntu/caffe/data/People-Art/JPEGImages/Biedermeier
/home/ubuntu/caffe/data/People-Art/JPEGImages/cartoon
/home/ubuntu/caffe/data/People-Art/JPEGImages/Classicism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Constructivism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Cubism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Cubo-Futurism
/home/ubuntu/caffe/data/People-Art/JPEGImages/Divisionism
/home/ubuntu/caffe/data/People-Art/JPEGImages/EnvironmentalArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/FantasticRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/FeministArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/HighRenaissance
/home/ubuntu/caffe/data/People-Art/JPEGImages/Impressionism
/home/ubuntu/caffe/data/People-Art/JPEGImages/InternationalGothic
/home/ubuntu/caffe/data/People-Art/JPEGImages/Japonism
/home/ubuntu/caffe/data/People-Art/JPEGImages/LowbrowArt
/home/ubuntu/caffe/data/People-Art/JPEGImages/MagicRealism
/home/ubuntu/caffe/data/People-Art/JPEGImages/MechanisticCubism

etc...

Full list of the folders containing images and ls of cd People-Art: https://gist.github.com/ProGamerGov/4627306588e9d232aa0431c4e26b9687

Each folder of images has a "gt.txt" file. This is what the gt.txt file looks like:

https://gist.github.com/ProGamerGov/2339b815b9e462cb69cd5bb7d156ee9a

Though I believe this may be part of the Cross-Depiction aspect of the data set.

My train.txt and val.txt at:

/home/ubuntu/caffe/data/People-Art/train.txt 
/home/ubuntu/caffe/data/People-Art/val.txt

train.txt: https://gist.github.com/ProGamerGov/1be5afe398c825cfc3ea119005af71fb val.txt: https://gist.github.com/ProGamerGov/08b121968b28e9f09ddf3e096f424944

My create_imagenet.sh file: https://gist.github.com/ProGamerGov/5f92bdc8e7d83756268f438cf15261eb

located at: /home/ubuntu/caffe/create_imagenet_2.sh

The prototxt of the model I want to fine tune has crop_size: 224, do I need to make the resize value in my create_imagenet_2.sh script the same value?

RESIZE_HEIGHT=256
RESIZE_WIDTH=256

I then run:

ubuntu@ip-Address:~/caffe$ ./create_imagenet_2.sh

Creating train lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:17:01.271579  2440 convert_imageset.cpp:79] Shuffling data
I0727 00:17:01.660755  2440 convert_imageset.cpp:82] A total of 0 images.
I0727 00:17:01.661175  2440 db.cpp:34] Opened lmdb examples/imagenet/people-art_train_lmdb
Creating val lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:17:01.971226  2451 convert_imageset.cpp:79] Shuffling data
I0727 00:17:02.378626  2451 convert_imageset.cpp:82] A total of 0 images.
I0727 00:17:02.379034  2451 db.cpp:34] Opened lmdb examples/imagenet/people-art_val_lmdb
Done.
ubuntu@ip-Address:~/caffe$

This creates two folders:

/home/ubuntu/caffe/examples/imagenet/people-art_train_lmdb
/home/ubuntu/caffe/examples/imagenet/people-art_val_lmdb

Inside both folders are data.mdb and lock.mdb files. They are all 8 KB each in both folders.

Trying to run the script again results in this:

ubuntu@ip-Address:~/caffe$ ./create_imagenet_2.sh
Creating train lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:19:56.326292  2482 convert_imageset.cpp:79] Shuffling data
I0727 00:19:56.722890  2482 convert_imageset.cpp:82] A total of 0 images.
F0727 00:19:56.723007  2482 db.cpp:27] Check failed: mkdir(source.c_str(), 0744) == 0 (-1 vs. 0) mkdir examples/imagenet/people-art_train_lmdbfailed
*** Check failure stack trace: ***
    @     0x7f5be1af4daa  (unknown)
    @     0x7f5be1af4ce4  (unknown)
    @     0x7f5be1af46e6  (unknown)
    @     0x7f5be1af7687  (unknown)
    @     0x7f5be1e54eee  caffe::db::LMDB::Open()
    @           0x403122  main
    @     0x7f5be0d04ec5  (unknown)
    @           0x403e5c  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
Creating val lmdb...
libdc1394 error: Failed to initialize libdc1394
I0727 00:19:56.955780  2491 convert_imageset.cpp:79] Shuffling data
I0727 00:19:57.348181  2491 convert_imageset.cpp:82] A total of 0 images.
F0727 00:19:57.348299  2491 db.cpp:27] Check failed: mkdir(source.c_str(), 0744) == 0 (-1 vs. 0) mkdir examples/imagenet/people-art_val_lmdbfailed
*** Check failure stack trace: ***
    @     0x7fcbeb0cedaa  (unknown)
    @     0x7fcbeb0cece4  (unknown)
    @     0x7fcbeb0ce6e6  (unknown)
    @     0x7fcbeb0d1687  (unknown)
    @     0x7fcbeb42eeee  caffe::db::LMDB::Open()
    @           0x403122  main
    @     0x7fcbea2deec5  (unknown)
    @           0x403e5c  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
Done.
ubuntu@ip-Address:~/caffe$

This is the readme.txt that came with the data set: https://gist.github.com/ProGamerGov/dfc8652f3db5bc91acdf34ff22c86bd2

I am not exactly sure what is causing my issue, but could it be that the script is not accounting for the structure of my data set?

Jul 27 '16 00:07 ProGamerGov

You need to put all the information into train.txt and val.txt. That is where caffe expects to find the urls and the labels. Like this:

/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/egon-schiele_seated-girl-1910.jpg 2
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/salvador-dali_still-life-pulpo-y-scorpa.jpg 2
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/orest-kiprensky_young-gardener-1817.jpg 7
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/david-burliuk_in-the-park.jpg 5
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/giovanni-battista-piranesi_vedute-di-roma-30.jpg 4
/home/hannu/vis/wikiart/VT-F15-ECE6504-HW2/2_finetuning-alexnet-wikiart-style/data/wikiart/images/basuki-abdullah_bocah.jpg 6

" A total of 0 images." means that caffe does not find the image files.

Setting the paths in the train.txt versus create_imagenet.sh can be a bit confusing. Unfortunately I don't have the script file for wikiart anymore. But I think what worked for me was to use full path in the train.txt and set the paths in the script as follows:

EXAMPLE=<full path where to place the lmdb> 
DATA=<full path where to find the train.txt and val.txt>
TOOLS=/home/hannu/caffe/build/tools

TRAIN_DATA_ROOT=/  
VAL_DATA_ROOT=/

The root paths are set to / because the train.txt contains full paths. It should also work so that one sets the data root path to directory and has relative urls in the txt files, but I remember having some difficulty with that.

I usually write small python scripts to manipulate or create the txt files in the correct format. For my geometrical shapes test I had image files name rect000001.png, ellipse000001.png and so on, then I wrote a python script like this:

from os import listdir
from os.path import isfile, join

mypath = "/home/hannu/work/Geom/data/train/data/"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

for file in onlyfiles:
  output = mypath + file
  if "rect" in file:
    output = output + " 0"
  elif "ellipse" in file:
    output = output + " 1"
  elif "triangle" in file:
    output = output + " 2"
  elif "xtrap" in file:
    output = output + " 3"
  elif "ytrap" in file:
    output = output + " 4"
  elif "ashape" in file:
    output = output + " 5"
  elif "lshape" in file:
    output = output + " 6"
  elif "oshape" in file:
    output = output + " 7"
  elif "ushape" in file:
    output = output + " 8"
  elif "vshape" in file:
    output = output + " 9" 
  print output

and run the output into train.txt. Nothing fancy but it worked.

Jul 27 '16 03:07 htoyryla

You might have a problem with your caffe installation, too, as you had this error message:

libdc1394 error: Failed to initialize libdc1394

I haven't seen this. As far as I understand, this library is for FireWire connection which should not be needed. Found this on google https://kradnangel.gitbooks.io/caffe-study-guide/content/caffe_errors.html

Jul 27 '16 03:07 htoyryla

I usually write small python scripts to manipulate or create the txt files in the correct format.

https://stackoverflow.com/questions/11003761/notepad-add-to-every-line

I just used this trick to fix my train and val files quickly.

You might have a problem with your caffe installation, too, as you had this error message:

libdc1394 is for video camera usage and not critical to Caffe as far as I understand. I have a few times disabled it and everything still works fine.

Jul 27 '16 03:07 ProGamerGov

Perhaps you can manage with notepad but for instance for Wikiart, I think I created the txt files from a downloaded csv file which had all the paths and labels but not in the correct format. Also once I needed to change the label numbering starting from zero instead of one.

Jul 27 '16 03:07 htoyryla

One more thing if you are planning to finetune. You should change the dimension of fc8 layer (assuming training a VGG) to match the number of categories in your dataset. Also, change the name of fc8 to something else, so that caffe will not try to initialize the weights from the original caffemodel which would fail because of the size mismatch. It is typical to use a name like fc8-10 if you have ten categories.

Like this in the training prototxt:

layers {
  bottom: "fc7"
  top: "fc8_168"
  name: "fc8_168"
  type: INNER_PRODUCT
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    num_output: 168
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "fc8_168"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  bottom: "fc8_168"
  bottom: "label"
  name: "loss"
  type: SOFTMAX_LOSS
  include: { phase: TRAIN }
}

Jul 27 '16 03:07 htoyryla

The changes to my create_imagenet_2.sh file, val.txt, train.txt:

https://gist.github.com/ProGamerGov/8267d29262f1bd6570e5918719600695

Still result in the same error.

Jul 27 '16 03:07 ProGamerGov

@htoyryla Thanks, I'll make the modifications to my train_val.prototxt.

Jul 27 '16 03:07 ProGamerGov

Changing the fc8 layer will not solve the LMDB creation. It is another issue which you'll face once you get the LMDB and start finetuning.

Jul 27 '16 03:07 htoyryla

I still don't see the labels in your train.txt, only the image paths.

Jul 27 '16 03:07 htoyryla

For the labels, do I put it as a different number value for each category?

Jul 27 '16 03:07 ProGamerGov

Yes, the labels should be integers from 0 to number_of_categories - 1 as I wrote earlier.

During training, caffe will feed each image into the model and, as there are outputs for each labels, train the model to activate the correct output for each image. Without the labels, there is nothing to guide the training and the model will not learn anything. Also, if all images have the same label, the model simply learns to always output that label regardless of the image, so it will not learn anything about the images. It is only when the labels tell something essential about the images that meaningful learning is possible.

Jul 27 '16 04:07 htoyryla

Ok, I think I got it now. Change the fc8_168 to fc8_43 because I have 43 categories. Then change it to fcpa_43. Even with scripts in Notepad, it will take me a little while to label all the categories. Do I need to do this for both the train and val txt files, or just the one?

Jul 27 '16 04:07 ProGamerGov

train.txt and val.txt both have to conform to this format. They also should not include same files, as the val.txt is used to crosscheck that the model really learns to generalize and not simply remember the individual images. I usually first make a train.txt containing all images & labels and then use a script to move every tenth entry to val.txt.

I might first make very short txt files to test if the lmdb creation succeeds. There may still be an issue in the create_imagenet.sh, too. I have sometimes struggled with the paths, everything looked ok but 0 images found, until suddenly after changing something back and forth it worked.

Jul 27 '16 04:07 htoyryla

I didn't understand your "Then change it to fcpa_43". It should be enough to change to fc8_43, so that the layer name is not fc8 which is in the caffemodel which you will finetune.

Jul 27 '16 04:07 htoyryla

@htoyryla Ok, thanks for the help!

Jul 27 '16 04:07 ProGamerGov

So I successfully create the lmdb files!

https://gist.github.com/ProGamerGov/d0038f7e3186d057bb7b26398bd764f9

It seems that a few of the images listed in the train.txt and val.txt files, did not exist in the actual data set.

Jul 27 '16 05:07 ProGamerGov

It happened to me too, now that you mention. Many (most?) datasets do not contain the actual images, only links for downloading from the original location. Probably the wikiart urls no longer work for some files, those files don't get downloaded. It is like broken links, not unusual in internet.

Jul 27 '16 06:07 htoyryla

Trying to start the fine tuning, seems to be throwing out an error:

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel.caffemodel -gpu 0
libdc1394 error: Failed to initialize libdc1394
I0728 00:34:55.191102  1907 caffe.cpp:113] Use GPU with device ID 0
I0728 00:34:55.575220  1907 caffe.cpp:121] Starting Optimization
I0728 00:34:55.575352  1907 solver.cpp:32] Initializing solver from parameters:
test_iter: 10
test_interval: 100
base_lr: 0.0005
display: 10
max_iter: 450000
lr_policy: "step"
gamma: 0.001
momentum: 0.9
weight_decay: 0.0005
stepsize: 1000
snapshot: 100
snapshot_prefix: "VGG16_SOD_finetune"
solver_mode: CPU
net: "/home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt"
I0728 00:34:55.575443  1907 solver.cpp:70] Creating training net from net file: /home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 33:90: String literals cannot cross line boundaries.
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 34:2: String literals cannot cross line boundaries.
F0728 00:34:55.576668  1907 upgrade_proto.cpp:928] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt
*** Check failure stack trace: ***
    @     0x7efff1371daa  (unknown)
    @     0x7efff1371ce4  (unknown)
    @     0x7efff13716e6  (unknown)
    @     0x7efff1374687  (unknown)
    @     0x7efff16d0f2e  caffe::ReadNetParamsFromTextFileOrDie()
    @     0x7efff17a5f12  caffe::Solver<>::InitTrainNet()
    @     0x7efff17a6f43  caffe::Solver<>::Init()
    @     0x7efff17a7116  caffe::Solver<>::Solver()
    @           0x40d210  caffe::GetSolver<>()
    @           0x4071e1  train()
    @           0x405781  main
    @     0x7efff0883ec5  (unknown)
    @           0x405d2d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

My solver.prototxt and train_val.prototxt: https://gist.github.com/ProGamerGov/dd88c6752fda7d6ff9dc22f00e4acd4c

Edit: Line 33's quotation mark was on line 34.

And I had an incorrect path here:

mean_file: "/home/ubuntu/caffe/data/people-art_train_mean.binaryproto" Fixed:

mean_file: "/home/ubuntu/caffe/examples/imagenet/people-art_train_mean.binaryproto"

Jul 28 '16 00:07 ProGamerGov

So everything was working well until this happened:

Memory required for data: 1152053324 <--- What's this measured in?

I0728 00:55:17.877467  2016 solver.cpp:315]     Test net output #2042: prob = 0.000113739
I0728 00:55:17.877477  2016 solver.cpp:315]     Test net output #2043: prob = 4.07005e-06
I0728 00:55:17.877488  2016 solver.cpp:315]     Test net output #2044: prob = 0.0013953
I0728 00:55:17.877499  2016 solver.cpp:315]     Test net output #2045: prob = 3.87571e-06
I0728 00:55:17.877509  2016 solver.cpp:315]     Test net output #2046: prob = 0.000115883
I0728 00:55:17.877521  2016 solver.cpp:315]     Test net output #2047: prob = 0.000124944
I0728 00:55:17.877531  2016 solver.cpp:315]     Test net output #2048: prob = 6.056e-06
I0728 00:55:17.877542  2016 solver.cpp:315]     Test net output #2049: prob = 0.000191529
I0728 00:55:17.877552  2016 solver.cpp:315]     Test net output #2050: prob = 0.000380109
F0728 00:55:19.191797  2016 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
    @     0x7f44cb886daa  (unknown)
    @     0x7f44cb886ce4  (unknown)
    @     0x7f44cb8866e6  (unknown)
    @     0x7f44cb889687  (unknown)
    @     0x7f44cbcb1e1b  caffe::SyncedMemory::mutable_gpu_data()
    @     0x7f44cbbf6323  caffe::Blob<>::mutable_gpu_diff()
    @     0x7f44cbcc9e60  caffe::CuDNNConvolutionLayer<>::Backward_gpu()
    @     0x7f44cbc08f4c  caffe::Net<>::BackwardFromTo()
    @     0x7f44cbc09191  caffe::Net<>::Backward()
    @     0x7f44cbcbeb2d  caffe::Solver<>::Step()
    @     0x7f44cbcbf40f  caffe::Solver<>::Solve()
    @           0x407246  train()
    @           0x405781  main
    @     0x7f44cad98ec5  (unknown)
    @           0x405d2d  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
ubuntu@ip-Address:~/caffe$

Googling the issue suggests changing the batch_size: values.

layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {
    source: "/home/ubuntu/caffe/examples/imagenet/people-art_train_lmdb"
    backend: LMDB
    batch_size: 28
  }
  transform_param {
    crop_size: 224
    #mirror: true
    mean_file: "/home/ubuntu/caffe/examples/imagenet/people-art_train_mean.binaryproto"
  }
  include: { phase: TRAIN }
}
layers {
  top: "data"
  top: "label"
  name: "data"
  type: DATA
  data_param {

    source: "/home/ubuntu/caffe/examples/imagenet/people-art_val_lmdb"
    backend: LMDB
    batch_size: 10
  }
  transform_param {
    crop_size: 224
    #mirror: false
    mean_file: "/home/ubuntu/caffe/examples/imagenet/people-art_val_mean.binaryproto"
  }
  include: { phase: TEST }
}

Changing batch_size: 28 to 12 seems to have fixed the issue.

Jul 28 '16 00:07 ProGamerGov


iter_300 accuracy = 0.25
iter_400 accuracy = 0.31
iter_600 accuracy = 0.21
iter_700 accuracy = 0.22
iter_800 accuracy = 0.16
iter_900 accuracy = 0.23
iter_1300 accuracy = 0.21
iter_1500 accuracy = 0.24

Are these accuracy values a good sign, bad sign, or is it too hard to tell?

The working files and command I used are here: https://gist.github.com/ProGamerGov/068ffa55981e8dac80572ccbd49955ab

Second Try:

In theory when you reduce the batch_size by a factor of X then you should increase the base_lr by a factor of sqrt(X)

Source: https://github.com/BVLC/caffe/issues/430

28/2=14

batch_size: 28 to batch_size: 14

batch_size: 10 to batch_size: 5 base_lr: 0.0005

~~√(0.0005) = 0.0223607~~

~~so~~

~~base_lr: 0.0223607~~

That had an accuracy of 0.

(0.0005)(√(2)) = 0.000707107

Now let's try out the changes:

./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0 2>&1 | tee log.txt

Test 3 had accuracy = 0.07 for iteration 100

Iteration 0 accuracy = 0

batch_size: 14

batch_size: 10

base_lr: 0.000707107

Test 4 was accuracy = 0 for iteration 100

Iteration 0 accuracy = 0

batch_size: 14

batch_size: 5

base_lr: 0.000707107

Test 5 had accuracy = 0.16 at iteration 100

Iteration 0 accuracy = 0

batch_size: 12 

batch_size: 5

base_lr: 0.0005

Test 6 had accuracy = 0.07 at iteration 100.

Iteration 0 accuracy = 0.02

batch_size: 12 

batch_size: 10

base_lr: 0.0005

Test 7 had accuracy = 0.12 at iteration 100

Iteration 0 accuracy = 0

batch_size: 16 

batch_size: 10

base_lr: 0.0005

Test 8 had accuracy = 0.16 at iteration 100

Iteration 0 accuracy = 0

batch_size: 16 

batch_size: 10

base_lr: 0.0005

Jul 28 '16 01:07 ProGamerGov

Test 9:

batch_size: 12

batch_size: 5

base_lr: 0.0005

Accuracy:

Iterations	Accuracy
100	accuracy = 0
200	accuracy = 0.24
300	accuracy = 0.18
400	accuracy = 0.24
500	accuracy = 0.26
600	accuracy = 0.34
700	accuracy = 0.12
800	accuracy = 0.2
900	accuracy = 0.28
1000	accuracy = 0.38
1100	accuracy = 0.28
1200	accuracy = 0.24
1300	accuracy = 0.22
1400	accuracy = 0.26
1500	accuracy = 0.34
1600	accuracy = 0.2
1700	accuracy = 0.26
1800	accuracy = 0.3
1900	accuracy = 0.3
2000	accuracy = 0.24
2100	accuracy = 0.24
2200	accuracy = 0.3
2300	accuracy = 0.32
2400	accuracy = 0.26
2500	accuracy = 0.24
2600	accuracy = 0.26
2700	accuracy = 0.34
2800	accuracy = 0.28
2900	accuracy = 0.24
3000	accuracy = 0.24
3100	accuracy = 0.2
3200	accuracy = 0.36
3300	accuracy = 0.3
3400	accuracy = 0.24
3500	accuracy = 0.2
3600	accuracy = 0.36
3700	accuracy = 0.28
3800	accuracy = 0.26
3900	accuracy = 0.22
4000	accuracy = 0.32
4100	accuracy = 0.3
4200	accuracy = 0.26
4300	accuracy = 0.22
4400	accuracy = 0.3
4500	accuracy = 0.34
4600	accuracy = 0.22
4700	accuracy = 0.26
4800	accuracy = 0.3

Not sure if I am supposed to be getting these results?

Jul 28 '16 04:07 ProGamerGov

You seem getting along well.

However, in your training prototxt, you need to change the line

 num_output: 205

to match the number of your categories.

 num_output: 43

Now you have 200+ unused outputs which mess up the accuracy. Change it and see how it affects the accuracy. Anyway, one should be prepared to run tens of thousands of iterations at least.

Jul 28 '16 06:07 htoyryla

@htoyryla Thanks, I missed that mistake. Hopefully that will help with the accuracy value. Though I may have to play around more with the base_lr and batch_size values because I had previous done so with those accidental extra categories.

When a run ./caffe/tools/extra/parse_log.py mylog.log ./

The "mylog.log.train" file is properly filled with data. But the "mylog.log.test" file only has NumIters,Seconds,TestAccuracy,TestLoss and nothing else. Not sure what is causing this issue.

Jul 28 '16 06:07 ProGamerGov

ProGamerGov [email protected] kirjoitti 28.7.2016 kello 9.59:

@htoyryla Thanks, I missed that mistake. Hopefully that will help with the accuracy value. Though I may have to play around more with the base_lr and batch_size values because I had previous done so with those accidental extra categories.

Your memory limits the batch size anyway. Use the largest size for training that doesn’t give out of memory.

As to learning rate, I have simply tried decreasing it until the losses start decreasing.

I haven’t used parse_log.py, I have only looked at the output on the screen. If you only see lots of lines with ”prob” values, then you can comment out the prob layer in the training prototxt (I mentioned this earlier). Then you should be able to view the loss printed for each nth training iteration (according to how you set in the solver.prototxt).

Jul 28 '16 07:07 htoyryla

This is the prob layer and I just comment it out like this, correct?

#layers {
#  bottom: "fc8_43"
#  top: "prob"
#  name: "prob"
#  type: SOFTMAX
#}

Your memory limits the batch size anyway. Use the largest size for training that doesn’t give out of memory.

As to learning rate, I have simply tried decreasing it until the losses start decreasing.

Thanks, I was looking for this knowledge but couldn't find it using Google or Github's search function. I'll try to fine tune the values tomorrow when I get the chance. I know some people have the values setup to change after a certain amount of iterations, so how crucial is something like that for fine tuning?

Jul 28 '16 07:07 ProGamerGov

The commenting out looks ok to me.

I was for some time baffled by the prob output lines which made difficult to see the loss and accuracy outputs, until I found out because the prob layer was not used as input to any other layers, caffe prints it out.

Jul 28 '16 07:07 htoyryla

About the finetuning in general. The losses and accuracy tell how well the output matches the labels. As the convolutional layers are already trained, the FC layers learn relatively quickly to give the right outputs. This is the idea of fine-tuning, adapt the fc layers to the new classification task. Therefore it is typical to set learning rate of convolutional layers to 0 (in the prototxt).

For neural-style, the fc layers are not of any interest. In my prototxt, also the convolutional layers adapt to the new data. But as the learning is controlled top down, the upper layers learn faster. Therefore even if the losses get to a good level, one perhaps should continue the training to allow the conv layers to adapt better to the new data.

One can use the snapshot caffemodels for trying them out in neural-style (or in convis). I also once made a lua script which compares the original and trained models for how much the weights have changed (max and avg values). I'll post it if I find it.

Jul 28 '16 07:07 htoyryla

Here's the code to compare the weights of a trained snapshot with the original https://gist.github.com/htoyryla/bb27efb4d6dedff87810a35ff083f44c

Change the paths to match your models. Note also that the script takes the iteration number of the snapshot as a parameter.

th spred2.lua 10000

That also explains this line: (be careful when editing this line).

fn = "/home/hannu/train/hplaces290516_iter_" .. arg[1] .. ".caffemodel"

The output gives the layer name, change of max weight, change of avg weight for conv layers, and for fc layers, the layer name, matrix difference, change of max weight and the change of avg weight. This is not meant to be an accurate tool but only to give an indication how each layer is changing.

Jul 28 '16 08:07 htoyryla

Should I be using the "type" settings in the solver.prototxt for fine tuning? If so which of the 6 options should I be using?

1. Stochastic Gradient Descent "SGD"
2. AdaDelta "AdaDelta"
3. Adaptive Gradient "AdaGrad"
4. Adam "Adam"
5. Nesterov’s Accelerated Gradient "Nesterov"
6. RMSprop "RMSProp"

Also, could this message that occurs when I start fine tuning, be of concern?

net: "/home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt"
I0728 23:11:46.251978  6796 solver.cpp:70] Creating training net from net file: /home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt
E0728 23:11:46.253047  6796 upgrade_proto.cpp:618] Attempting to upgrade input file specified using deprecated V1LayerParameter: /home/ubuntu/caffe/models/vgg16_finetune/vgg16_train_val.prototxt
I0728 23:11:46.253442  6796 upgrade_proto.cpp:626] Successfully upgraded file specified using deprecated V1LayerParameter
I0728 23:11:46.253579  6796 net.cpp:257] The NetState phase (0) differed from the phase (1) specified by a rule in layer data
I0728 23:11:46.253631  6796 net.cpp:257] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0728 23:11:46.254010  6796 net.cpp:42] Initializing net from parameters:

Also, is it possible to take two or more pre-trained models and combine/merge them into a single model using Caffe?

Jul 28 '16 23:07 ProGamerGov

My current test that starts at iteration 800 with a test every 100 iterations:

accuracy = 0.25
accuracy = 0.275
accuracy = 0.15
accuracy = 0.2
accuracy = 0.2875     Iteration 1200
accuracy = 0.1375
accuracy = 0.3125                                Iteration 1400
accuracy = 0.1375
accuracy = 0.175
accuracy = 0.2875     Iteration 1700
accuracy = 0.15
accuracy = 0.2875     Iteration 1900
accuracy = 0.15
accuracy = 0.175
accuracy = 0.2875     Iteration 2200
accuracy = 0.15
accuracy = 0.25
accuracy = 0.1625
accuracy = 0.1875
accuracy = 0.3                                     Iteration 2700
accuracy = 0.1375
accuracy = 0.25
accuracy = 0.175       Iteration 3000
accuracy = 0.1875
accuracy = 0.2875     Iteration 3200
accuracy = 0.125
accuracy = 0.25
accuracy = 0.2
accuracy = 0.175
accuracy = 0.3                                    Iteration 3700

Here's my log file: https://gist.github.com/ProGamerGov/fe1623113a5d87b2da6a0f67b4d060bf

I then stopped it and changed the base_lr to 0.0000005

accuracy = 0.125              Iteration 3700
accuracy = 0.3                                     Iteration 3800
accuracy = 0.1375
accuracy = 0.2

Farther tweaking of the values and starting at iteration 3700:

accuracy = 0.125          Iteration 3700
accuracy = 0.3               Iteration 3800
accuracy = 0.1375
accuracy = 0.2          Iteration 4000
accuracy = 0.275              Iteration 4100
accuracy = 0.1375
accuracy = 0.3                                  Iteration 4300
accuracy = 0.1375
accuracy = 0.175
accuracy = 0.2875      Iteration 4600
accuracy = 0.15
accuracy = 0.2875          Iteration 4800
accuracy = 0.15
accuracy = 0.175       Iteration 5000
accuracy = 0.2875               Iteration 5100
accuracy = 0.15
accuracy = 0.25
accuracy = 0.1625
accuracy = 0.1875
accuracy = 0.3                                 Iteration 5600
accuracy = 0.1375
accuracy = 0.25
accuracy = 0.175
accuracy = 0.1875
accuracy = 0.2875              Iteration 6100

Jul 29 '16 00:07 ProGamerGov

Usually I have simply used SGD which is the default. Recently, one dataset would not start learning at all, then I tried AdaDelta and it worked.

type: "AdaDelta" delta: 1e-6

Jul 29 '16 03:07 htoyryla

I would rather be interested in the losses first. Loss is measured for the training set. Are they decreasing? If they are decreasing then the network is learning, but if the accuracy is not increasing, it is not learning to generalize (this is called "overfitting").

If the losses are not going down then something else is wrong. I had such a case in my first attempts. I still don't understand why that happened; increasing num_output by one helped, but it does not make sense. Perhaps there was something wrong with the labels to begin with.

Your learning rate looks pretty low already. I've never used so low values.

Jul 29 '16 04:07 htoyryla

Here's the log file: https://gist.github.com/ProGamerGov/29219f98178a91ee3ddf039728db9bb3

num_output Increasing it by one means there is a new category composed of nothing, correct?

Edit: Train.txt and Val.txt with labels: https://gist.github.com/ProGamerGov/6978038a0b40795289cafb554d9311af

Jul 29 '16 04:07 ProGamerGov

Your log shows every iteration starting from 3700. One cannot really see a trend looking a such a small sample. What counts is the big picture, something like the loss at every 100th iteration starting from zero.

Jul 29 '16 04:07 htoyryla

This is from the model's research paper:

CNN feature. We use Caffe [28] for fine-tuning the CNN model pre-trained on ImageNet [44]. Images are resized to 256 × 256 regardless of of their original aspect ratios. The top-left, top-right, bottom-left and bottom-right 227×227 crops of a image are used to augment the training data. We use Caffe’s default setting for training the CNN model of [30], but reduce the starting learning rate to 0.001 as in [22]. We stop tuning after around 30 epochs, as the training loss no longer decreases.

The model I am fine tuning is from here: http://cs-people.bu.edu/jmzhang/sos.html

Specific the PDF file of the paper can be found here: http://cs-people.bu.edu/jmzhang/SOS/SOS_preprint.pdf

I resized the images I had to 224x224. Could this be the issue?

# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
RESIZE=true
if $RESIZE; then
  RESIZE_HEIGHT=224
  RESIZE_WIDTH=224
else
  RESIZE_HEIGHT=0
  RESIZE_WIDTH=0
fi

Jul 29 '16 04:07 ProGamerGov

I think the LMDB should be made with 256x256 images. Cropping is then done by caffe as specified in the prototxt. You could try to change the crop to 227 in the prototxt.

If you have 224x224 images in the LMDB you might have to recreate the db.

Is it at all a VGG16 you are trying to fine-tune?

Jul 29 '16 04:07 htoyryla

Yes, it's CNN Object Proposal Models for Salient Object Detection which is a VGG16 model.

https://github.com/BVLC/caffe/wiki/Model-Zoo

VGG16: This model is used in the paper. GoogleNet: This model is smaller, faster and slightly better than the VGG16 model.

I think the LMDB should be made with 256x256 images. Cropping is then done by caffe as specified in the prototxt. You could try to change the crop to 227 in the prototxt.

I ran the LMDB code for 224 not 256. I changed the script manually.

Jul 29 '16 04:07 ProGamerGov

Though the 256 LMDB results in 9] Check failed: datum_height == data_mean_.height() (256 vs. 224) This is because your train_val.prototxt had crop_size: 224. But based on the research paper, it looks like the model was made with crop_size: 256.

Jul 29 '16 04:07 ProGamerGov

The idea is that the training data is 256x256 in the LMDB, and caffe then crops the images during training to the size specified in the prototxt.

There is some confusion here now. LMDB creation does not look into any prototxt, so I cannot understand how the LMDB creation could fail because of the crop in prototxt (which is done during the training).

Try to recreate the LMDB with 256x256 images.

Jul 29 '16 04:07 htoyryla

There is some confusion here now. LMDB creation does not look into any prototxt, so I cannot understand how the LMDB creation could fail because of the crop in prototxt (which is done during the training).

Sorry, my bad, I did not realize the train_val crop and the IMDB crop were separate.

Jul 29 '16 04:07 ProGamerGov

" But based on the research paper, it looks like the model was made with crop_size: 256."

The research paper clearly says that images were 256x256 and crops are 227x227. Resize for the LMDB and crop while training are separate operations. Also, resize changes the whole image to new dimensions, crop cuts out a part of the image.

Jul 29 '16 04:07 htoyryla

Maybe I just need to play around with the solver.prototxt some more to find the values that will let me rise above 31% accuracy. Or let it run for a lot more iterations to understand the overall trend of accuracy results and loss values.

Jul 29 '16 04:07 ProGamerGov

I would set the solver to print at every 100 intervals, start from the beginning and see if the losses are decreasing.

If they are, then start looking at the accuracy and if needed, tweak the learning rate. You might also try AdaDelta which worked for me.

If everything proceeds nicely up to a point but not beyond, it may depend on many things. Successful training is not easy. On the other hand, finetuning (as opposed to training from scratch) should not be so difficult either. It the improvement stalls, it may be due to deficiencies in the training data. Like not enough material for each label. Or the material is simply difficult to learn (such as images which could belong to multiple categories).

One thing to remember is that even if the training is not very successful, one can anyway always try how the model works in neural-style.

Jul 29 '16 05:07 htoyryla

One thing to remember is that even if the training is not very successful, one can anyway always try how the model works in neural-style.

Just trying iteration 5600 at the moment, and it appears to have pretty visibly changed compared to my control test.

Edit, Iteration 200 in Neural-Style comparison between my fine tuning model and the control model I am trying to fine tune: https://imgur.com/a/Ul9Ho

Testing shows it's better at Cubism style images than the original model.

Here is the comparison with multiple other models: https://imgur.com/a/FoidP

Control vs Fine-Tuned iter 5600 on three images: https://imgur.com/a/DWL77

Jul 29 '16 05:07 ProGamerGov

"it's better at Cubism style"

This gives me an idea about a modification to neural-style when one has a model that outputs style probability. I have experimented using also FC layers in a modified neural-style (see http://liipetti.net/erratic/2016/03/28/controlling-image-content-with-fc-layers/ and the sequels). If the model would output the style like cubism from FC8_x, one might use it as an additional factor to steer the image to a particular style. One would, in addition to content and style images select the style category among the possible values at FC8_x. Or several style categories with different weights (because FC8_x can be made to output probabilities for each category as I describe in http://liipetti.net/erratic/2016/03/31/i-have-seen-a-neural-mirage/). The code I used in my experiments is already quite close. Especially the experiments described in http://liipetti.net/erratic/2016/04/20/getting-the-space-back/ .

Jul 29 '16 16:07 htoyryla

I added these two layers into the train_val.prototxt to help understand how well training is going.

layers {
  name: "accuracy/top1"
  type: ACCURACY
  bottom: "fc8_43"
  bottom: "label"
  top: "accuracy@1"
  include: { phase: TEST }
  accuracy_param {
    top_k: 1
  }
}
layers {
  name: "accuracy/top5"
  type: ACCURACY
  bottom: "fc8_43"
  bottom: "label"
  top: "accuracy@5"
  include: { phase: TEST }
  accuracy_param {
    top_k: 5
  }

Jul 30 '16 01:07 ProGamerGov

After no success in breaking past the 30-31% level of accuracy, I reinstalled everything on a fresh Ubuntu 16.04 with Cuda 8.0RC and Cudnn v5.

Jul 30 '16 23:07 ProGamerGov

I recieved at error at iteration 8900: https://gist.github.com/ProGamerGov/4ac8b8ece45fd5a1a873636cdc673386


I0731 07:23:18.979414 25048 solver.cpp:454] Snapshotting to binary proto file examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.caffemodel
I0731 07:23:22.974056 25048 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.solverstate
F0731 07:23:24.131312 25048 io.cpp:69] Check failed: proto.SerializeToOstream(&output) 
*** Check failure stack trace: ***
    @     0x7f59c234c5cd  google::LogMessage::Fail()
    @     0x7f59c234e433  google::LogMessage::SendToLog()
    @     0x7f59c234c15b  google::LogMessage::Flush()
    @     0x7f59c234ee1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f59c2b0f295  caffe::WriteProtoToBinaryFile()
    @     0x7f59c2ae7947  caffe::SGDSolver<>::SnapshotSolverStateToBinaryProto()
    @     0x7f59c2acd534  caffe::Solver<>::Snapshot()
    @     0x7f59c2ace61e  caffe::Solver<>::Step()
    @     0x7f59c2acef49  caffe::Solver<>::Solve()
    @           0x40bd89  train()

Trying to run it again from iteration 8900 or up to iteration 8900 from an earlier snapshot, gives me this: https://gist.github.com/ProGamerGov/e8b0c5507323609e8a252bbba5f68d58

I0731 08:15:54.750869  3529 caffe.cpp:241] Resuming from examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.solverstate
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537750305
F0731 08:16:03.274936  3529 sgd_solver.cpp:316] Check failed: state.history_size() == history_.size() (29 vs. 32) Incorrect length of history blobs.
*** Check failure stack trace: ***
    @     0x7fc0b51875cd  google::LogMessage::Fail()
    @     0x7fc0b5189433  google::LogMessage::SendToLog()
    @     0x7fc0b518715b  google::LogMessage::Flush()
    @     0x7fc0b5189e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc0b5922f7a  caffe::SGDSolver<>::RestoreSolverStateFromBinaryProto()
    @     0x7fc0b5903127  caffe::Solver<>::Restore()
    @           0x40badf  train()
    @           0x4077c8  main
    @     0x7fc0b391e830  __libc_start_main
    @           0x408099  _start
    @              (nil)  (unknown)

However, I could start from the iteration 8900 caffemodel without an error.

Jul 31 '16 08:07 ProGamerGov

How’s your disk space? If one takes snapshots often they can fill a disk suprisingly fast. Happened to me once with a 240 GB SSD.

ProGamerGov [email protected] kirjoitti 31.7.2016 kello 11.17:

I recieved at error at iteration 8900: https://gist.github.com/ProGamerGov/4ac8b8ece45fd5a1a873636cdc673386

I0731 07:23:18.979414 25048 solver.cpp:454] Snapshotting to binary proto file examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.caffemodel I0731 07:23:22.974056 25048 sgd_solver.cpp:273] Snapshotting solver state to binary proto file examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.solverstate F0731 07:23:24.131312 25048 io.cpp:69] Check failed: proto.SerializeToOstream(&output) *** Check failure stack trace: *** @ 0x7f59c234c5cd google::LogMessage::Fail() @ 0x7f59c234e433 google::LogMessage::SendToLog() @ 0x7f59c234c15b google::LogMessage::Flush() @ 0x7f59c234ee1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f59c2b0f295 caffe::WriteProtoToBinaryFile() @ 0x7f59c2ae7947 caffe::SGDSolver<>::SnapshotSolverStateToBinaryProto() @ 0x7f59c2acd534 caffe::Solver<>::Snapshot() @ 0x7f59c2ace61e caffe::Solver<>::Step() @ 0x7f59c2acef49 caffe::Solver<>::Solve() @ 0x40bd89 train() Trying to run it again from iteration 8900 or up to iteration 8900 from an earlier snapshot, gives me this: https://gist.github.com/ProGamerGov/e8b0c5507323609e8a252bbba5f68d58

I0731 08:15:54.750869 3529 caffe.cpp:241] Resuming from examples/imagenet/VGG16_SOD_finetune_from_scratch_iter_8900.solverstate [libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. [libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537750305 F0731 08:16:03.274936 3529 sgd_solver.cpp:316] Check failed: state.history_size() == history_.size() (29 vs. 32) Incorrect length of history blobs. *** Check failure stack trace: *** @ 0x7fc0b51875cd google::LogMessage::Fail() @ 0x7fc0b5189433 google::LogMessage::SendToLog() @ 0x7fc0b518715b google::LogMessage::Flush() @ 0x7fc0b5189e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fc0b5922f7a caffe::SGDSolver<>::RestoreSolverStateFromBinaryProto() @ 0x7fc0b5903127 caffe::Solver<>::Restore() @ 0x40badf train() @ 0x4077c8 main @ 0x7fc0b391e830 __libc_start_main @ 0x408099 _start @ (nil) (unknown) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Jul 31 '16 08:07 htoyryla

How’s your disk space? If one takes snapshots often they can fill a disk suprisingly fast. Happened to me once with a 240 GB SSD.

The space was full and I suspect that cause the initial stopping and error. However I can't seem to start it again from the snapshot. Though something like a reboot might fix that, but it's way too early in the morning already, so I should get some sleep.

Jul 31 '16 08:07 ProGamerGov

If the disk got full while saving the snapshot, then the snapshot is likely to be corrupted. Try an earlier one.

Hannu

ProGamerGov [email protected] kirjoitti 31.7.2016 kello 11.38:

How’s your disk space? If one takes snapshots often they can fill a disk suprisingly fast. Happened to me once with a 240 GB SSD.

The space was full and I suspect that cause the initial stopping and error. However I can't seem to start it again from the snapshot. Though something like a reboot might fix that, but it's way too early in the morning already, so I should get some sleep.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Jul 31 '16 08:07 htoyryla

If the disk got full while saving the snapshot, then the snapshot is likely to be corrupted. Try an earlier one.

I'll definitely check out that possibility, training was going really well before it happened. I had almost hit my first epoch.

Jul 31 '16 08:07 ProGamerGov

Here are the test results for iteration 11500:

I0801 00:14:02.203501  3971 caffe.cpp:308] Batch 49, accuracy@5 = 0.166667
I0801 00:14:02.203507  3971 caffe.cpp:313] Loss: 0
I0801 00:14:02.203533  3971 caffe.cpp:325] accuracy = 0.103333
I0801 00:14:02.203552  3971 caffe.cpp:325] accuracy@1 = 0.103333
I0801 00:14:02.203564  3971 caffe.cpp:325] accuracy@5 = 0.343333

It does not work very well in Neural-Style.

Aug 01 '16 00:08 ProGamerGov

Whenever I add "type: "AdaDelta" to my solver.prototxt file, it gives me the following error:

In my solver.prototxt, I put: type: "AdaDelta"

ubuntu@ip-Address:~/caffe$ ./build/tools/caffe train -solver models/vgg16_finetune/solver.prototxt -weights models/vgg16_finetune/VGG16_SOD_finetune.caffemodel -gpu 0 2>&1 | tee ~/mylog.log
libdc1394 error: Failed to initialize libdc1394
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.SolverParameter: 23:5: Message type "caffe.SolverParameter" has no field named "type".
F0801 02:02:46.553591  4373 io.hpp:54] Check failed: ReadProtoFromTextFile(filename, proto)
*** Check failure stack trace: ***
    @     0x7f86d6c24daa  (unknown)
    @     0x7f86d6c24ce4  (unknown)
    @     0x7f86d6c246e6  (unknown)
    @     0x7f86d6c27687  (unknown)
    @           0x407591  train()
    @           0x405781  main
    @     0x7f86d6136ec5  (unknown)
    @           0x405d2d  (unknown)
    @              (nil)  (unknown)
ubuntu@ip-Address:~/caffe$

Aug 01 '16 02:08 ProGamerGov

You need to give a delta value too, like I showed earlier.

type: "AdaDelta"
delta: 1e-6

This is what I have tried (and it worked with my data when SGD didn't converge at all).

Aug 01 '16 04:08 htoyryla

I0801 00:14:02.203501  3971 caffe.cpp:308] Batch 49, accuracy@5 = 0.166667
I0801 00:14:02.203507  3971 caffe.cpp:313] Loss: 0
I0801 00:14:02.203533  3971 caffe.cpp:325] accuracy = 0.103333
I0801 00:14:02.203552  3971 caffe.cpp:325] accuracy@1 = 0.103333
I0801 00:14:02.203564  3971 caffe.cpp:325] accuracy@5 = 0.343333

Probably something is really wrong and this training is not working. If loss is zero already, then there is nothing to supervise the learning further. I would check the dataset: are all labels present with enough examples both in train and val datasets. If not, then the model cannot learn all categories succesfully.

Still, I am not fully sure how to read your printout, as it does not show the difference between training and testing. Are you having both accuracy and loss in the model at the same time? One usually looks at loss when training, and accuracy when testing. Loss should decrease towards zero, accuracy increase towards one, if everything is ok.

I am familiar with lines like these (from an unsuccessful training, the only log I found left). Initially I looked at losses for each iteration, and tested for accuracy at every, say, 1000th iteration. If it trains well, I started printing losses less often.

I0305 11:36:38.108285 26966 solver.cpp:338] Iteration 2000, Testing net (#0)
I0305 11:52:51.347494 26966 solver.cpp:406]     Test net output #0: accuracy = 0

I0530 10:51:16.171003 19113 sgd_solver.cpp:106] Iteration 2718, lr = 1e-05
I0530 10:52:43.504497 19113 solver.cpp:229] Iteration 2720, loss = 2.8002

Aug 01 '16 05:08 htoyryla

Do you know of any scripts/programs I can use to create the labels for data sets? I think their are a few more problematic categories I want to purge, and a few new categories I want to add.

If I have a smaller (around 100 images maybe?) data set of images which are high resolution, would it make sense to chop them into pieces? Or should I keep them whole?

Aug 01 '16 07:08 ProGamerGov

Usually the labeling must be done by hand, i.e. one must consider each image separately and decide the correct label.

However, once I created a dataset out of my own photos by using a places-205 model to output labels. In this way I got a file with file paths and labels. There were missing labels in the resulting set (the original model used 205 labels but found only 168 in my photos), so I wrote a script to renumber the labels. But I had to write the scripts myself, both for the labeling using the places model and renumbering the labels, and these scripts are not directly applicable to other cases.

Aug 01 '16 08:08 htoyryla

However, once I created a dataset out of my own photos by using a places-205 model to output labels. In this way I got a file with file paths and labels.

Simon Stålenhag's artwork is mostly of more landscape oriented in nature, so I wonder if that would work well with his artwork?

These are some of the sites/albums I found which had unique artwork from him, do you think a similar approach would work with these?

http://www.simonstalenhag.se/ https://imgur.com/gallery/cGibB https://imgur.com/gallery/ODOi0 https://imgur.com/gallery/VZLDN

His artwork seems like it would better with currently existing pre-trained models that the other data set I have been trying to use.

Is there at least anyway I can streamline the process of manually labeling images?

Aug 01 '16 08:08 ProGamerGov

I am not at all sure that places205 model would produce meaningful characterization out of these. One could test what it sees in a picture, however, using a script such as I describe here http://liipetti.net/erratic/2016/03/31/i-have-seen-a-neural-mirage/

The amount of training images could also be a problem. One needs a lot. I had only some 2600 total for the 168 labels, and that is far too few. It would be better to have 2600 per each label.

Aug 01 '16 08:08 htoyryla

The amount of training images could also be a problem. One needs a lot. I had only some 2600 total for the 168 labels, and that is far too few.

If I use his work as a single category, I would have about 300-700 images. If I randomly crop the high res images into pieces, I could stretch it to a larger number. The next trick would be finding a data set (preferably not too large of a data set in terms of file size) I could easily add on a category to. I am unsure of how to go about creating multiple random crops from each image, but listing every image as a single category seems more doable than trying to label them all separately.

Aug 01 '16 08:08 ProGamerGov

I guess that if you can find a few hundred images for say, ten different artists, each with clearly different images, then you could train a model to predict which artist's work an image resembles (10 labels). I believe it would work. But I don't know how useful that model would be. The model would learn something about style, but not necessarily enough about objects and features like lines and shapes etc which are essential in neural-style.

Aug 01 '16 08:08 htoyryla

But I don't know how useful that model would be. The model would learn something about style, but not necessarily enough about objects and features like lines and shapes etc which are essential in neural-style.

Because fine tuning exploits what the model already knows, to train it on new content, I can use a model trained on a data set that includes artwork. This also means I can train using less images than I would need for training from scratch. The popular PASCAL VOC data set contains artwork, and I suspect the imagenet data set may contain artwork as well. My previous fine tuning tests were on a model that was fine tuned for picking out prominent objects in real life images, and thus it becomes harder to train on new content using the original non-fine tuned model parts. So using a non-fine tuned model trained on an artwork containing data set should allow me to successfully train it on the desired content.

Aug 01 '16 09:08 ProGamerGov

Usually finetuning is done by setting learning rate of conv layers to zero. One assumes that the conv layers already know the necessary features, and one only needs to re-train the FC layers for the different categorization.

It is not at all obvious what happens to the conv layers when finetuning changes them too, as we are doing now. It is possible that they, based on their previous learning, adapt nicely to the features in the new data. Or it may happen that they start changing towards something new in a detrimental manner (as far as neural-style is concerned). I think I have seen the latter case happen a few times.

It is also the case that it takes a lot of iterations for a deep model for all the conv layers to adapt completely. The idea of fast finetuning comes from not changing the conv layers at all.

But yes, I agree with you that it is better if the new data is similar.

Aug 01 '16 09:08 htoyryla

Another matter... when I wrote

The model would learn something about style, but not necessarily enough about objects and features like lines and shapes etc which are essential in neural-style.

I was thinking of that for neural-style, a model must react to features like lines, shapes and texture. For instance when I trained a model on geometrical shapes of single color each, it produced only single color blobs in neural-style. It didn't not really see the detailed objects and textures which are important. One might have thought that it would produce abstract pictures, which was kind of my goal, but it didn't really, because, failing to see the objects in the content image, it didn't place the colored blobs in a meaningful arrangement.

But of course, when starting from a well trained model which already recognizes objects and textures, one can hope that the further training will not totally mess up the previous capabilities.

Aug 01 '16 09:08 htoyryla

My current experiment seems to actually produce a better result than the original Places 365 Hybrid model:

Aug 02 '16 00:08 ProGamerGov

Should I manually reshuffle my LMDB files every epoch?

Also, can I have multiple categories for a single image in Caffe if it can fall under multiple categories?

Like this example where all images are part of category one, but are then divided as well into 3 sub categories:

images/image__101.jpg 1 2
images/image__102.jpg 1 1
images/image__103.jpg 1 2
images/image__104.jpg 1 3
images/image__105.jpg 1 1
images/image__106.jpg 1 3

Aug 02 '16 01:08 ProGamerGov

In this album here: https://imgur.com/a/nxPCC It appears as though the image that has been created with the fine tuned model, creates a "cleaner" image (at least on some parts of the image) than the un-fine tuned model.

I theorize that lightly fine tuning a model on the work of an artist who created your intended style image, can enhance the ability of the model to transfer their style.

Aug 02 '16 05:08 ProGamerGov

Why is it that the NIN model used by Neural-Style has many usable layers, that are not listed in the train_val.prototxt?

Aug 02 '16 21:08 ProGamerGov

If anyone is interested I can give you the following so that you do not need to spend hours and hours collecting and preparing the artwork.

simon1.tar.gz 586 images (only colored) | 184 MB simon2.tar.gz 725 images (including uncolored sketches and photos of sketches) | 282 MB

None of the images have been resized or cropped yet. A txt file called "filelist.txt" lists every image's name, so all you need to do is add the category value and the paths for use in Caffe when making your train.txt and val.txt files.

The usefulness of smaller data sets of images can be increased for training by using transformations in the train_val.prototxt, I have discovered. This is in addition to fine tuning which can lower the required amount of images significantly.

Aug 03 '16 01:08 ProGamerGov

Hi, it makes few days I'm following your quest of training your own ConvNet. I'm interested by those training set in order to practice ConvNet Tunning.

best

Aug 03 '16 03:08 at0mb0y

@at0mb0y Here are the files:

simon1.tar.gz: https://drive.google.com/file/d/0B--sVcawvPKfSkIyc1ZwX2tOSVE/view?usp=sharing

simon2.tar.gz: https://drive.google.com/file/d/0B--sVcawvPKfTDFWQVFaalhmd3c/view?usp=sharing

If you make a good model with the images, please be sure to post here so I and other can check it out in Neural-Style!

Aug 03 '16 04:08 ProGamerGov

"Why is it that the NIN model used by Neural-Style has many usable layers, that are not listed in the train_val.prototxt?"

Which layers and how did you find them? In principle, the training prototxt is the template according to which the model is originally created, so everything should be there (unless someone has removed layers from the prototxt).

Neural-style loads the model using loadcaffe, which requires the caffemodel and the prototxt as parameters. It is not fully clear to me how loadcaffe would behave if the prototxt would not include all layers. From the source it looks like that it builds the model according to the prototxt so any layers not present in it would not be available to neural-style.

Aug 03 '16 06:08 htoyryla

As to your hypotheses, my feeling is that supervised learning with labels is not the best way to train convlayers to respond to stylistic features. It may work but there's nothing to guarantee that it will. Training with labels makes the model produce the labels, and everything else is a side effect, to a large extent beyond control.

That's why I am interested in training using autoencoders, generative adversarial networks or something similar. For example, a model in which the training image is processed to a vector and then back to an image. The training is directed so that the resulting image is as close to the original as possible. No labels needed and the model learns directly about the images.

Aug 03 '16 07:08 htoyryla

I am not aware of any need to manually reshuffle data. That would ruin the idea of really heavy training which can run for days and weeks on its own.

Then you asked about multilabel training. It looks like there are ways to do it: http://stackoverflow.com/questions/32680860/caffe-with-multi-label-images

Aug 03 '16 07:08 htoyryla

That's why I am interested in training using autoencoders, generative adversarial networks or something similar. For example, a model in which the training image is processed to a vector and then back to an image. The training is directed so that the resulting image is as close to the original as possible. No labels needed and the model learns directly about the images.

@htoyryla Would you mind elaborating more on this?

I also discovered this modified script here for using a model to label images in a directory full of images: https://groups.google.com/forum/#!topic/caffe-users/sLgqUgSM3XQ but it does not seem to work properly.

More info on using classify.py: https://groups.google.com/forum/#!searchin/caffe-users/classify|sort:relevance/caffe-users/YSzAIxnDI7w/KKo-0yofEwAJ

I found this example of classifying a single image:

./build/examples/cpp_classification/classification.bin models/own_net/deploy.prototxt examples/RSR_50k_all_1k_db/snapshot_iter_10000.caffemodel examples/RSR_50k_all_1k_db/mean.binaryproto examples/RSR_50k_all_1k_db/labels.txt /home/ubuntu/datasets/RSR_50k_1ll_1k/Testing/[0]/outfile243.jpg

Aug 03 '16 20:08 ProGamerGov

I ran this:

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/rocky_beach.jpg 2>&1 | tee ~/mylog.log

And got this output

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/rocky_beach.jpg ----------
0.7378 - "n09428293 seashore, coast, seacoast, sea-coast 978"
0.0909 - "n09399592 promontory, headland, head, foreland 976"
0.0823 - "n09421951 sandbar, sand bar 977"
0.0480 - "n02894605 breakwater, groin, groyne, mole, bulwark, seawall, jetty 460"
0.0199 - "n04606251 wreck 913"

Aug 03 '16 21:08 ProGamerGov

After testing, it seems that Places365-Hybrid is ok at identifying the images.

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image586.jpg 2>&1 | tee ~/mylog.log


[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image586.jpg ----------
0.0947 - "n03344393 fireboat 554"
0.0858 - "n04044716 radio telescope, radio reflector 755"
0.0807 - "n04606251 wreck 913"
0.0681 - "n03126707 crane 517"
0.0673 - "n03388043 fountain 562"

Image_586 Does not seem to understand alien world environments very well.

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image585.jpg 2>&1 | tee ~/mylog.log


[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image585.jpg ----------
0.3935 - "n03126707 crane 517"
0.2111 - "n03216828 dock, dockage, docking facility 536"
0.0900 - "n02687172 aircraft carrier, carrier, flattop, attack aircraft carrier 403"
0.0805 - "n03393912 freight car 565"
0.0554 - "n04347754 submarine, pigboat, sub, U-boat 833"

Image_585

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image__9.jpg 2>&1 | tee ~/mylog.log

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image__9.jpg ----------
0.3298 - "n09428293 seashore, coast, seacoast, sea-coast 978"
0.0906 - "n04606251 wreck 913"
0.0866 - "n09421951 sandbar, sand bar 977"
0.0630 - "n04251144 snorkel 801"
0.0442 - "n10565667 scuba diver 983"

Image__9

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image569.jpg 2>&1 | tee ~/mylog.log

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image569.jpg ----------
0.1291 - "n03000684 chain saw, chainsaw 491"
0.1138 - "n03803284 muzzle 676"
0.0722 - "n04179913 sewing machine 786"
0.0398 - "n02130308 cheetah, chetah, Acinonyx jubatus 293"
0.0365 - "n03146219 cuirass 524"

Image569

./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt /home/ubuntu/caffe/data/image569.jpg 2>&1 | tee ~/mylog.log

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for /home/ubuntu/caffe/data/image551.jpg ----------
0.4913 - "n04296562 stage 819"
0.1085 - "n03691459 loudspeaker, speaker, speaker unit, loudspeaker system, speaker system 632"
0.0825 - "n04009552 projector 745"
0.0569 - "n03782006 monitor 664"
0.0331 - "n03180011 desktop computer 527"

Image569

Aug 03 '16 22:08 ProGamerGov

I tried to make a script that would test all of the images for whether or not it could label them, but it does not work. It can't find the files in the echo'd command.

#!/usr/bin/env bash 
#echo "Script is running!"

num_val=0
            echo $num_val          

for ((n=0;n<5;n++))
do

num_val=$((num_val+1))
            echo $num_val   

        CMDone= 

            "bash ./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt data/s_art/simon1/image"$num_val".jpg"

            #echo $CMDone


            #sleep 10

done

Edit, I ran this variation of the script, with this command:

ubuntu@ip-Address:~/caffe$ bash ./script_3.sh 2>&1 | tee ~/mylog.log

Script

#!/usr/bin/env bash 
#echo "Script is running!"

num_val=0
            #echo $num_val  
          cd caffe             

for ((n=0;n<586;n++))
do

num_val=$((num_val+1))
            #echo $num_val   

        CMDone= 

            "bash ./build/examples/cpp_classification/classification.bin models/places365/deploy_vgg16_hybrid1365.prototxt models/places365/vgg16_hybrid1365.caffemodel examples/imagenet/s_art_mean.binaryproto models/places365/categories_hybrid1365.txt data/s_art/simon1/image"$num_val".jpg"

            #echo $CMDone


            #sleep 100

done

I then used Note++ to remove ./script_3.sh: line 16: bash and : No such file or directory from every line. Then I added 2>&1 | tee ~/mylog.log on the first line and >> ~/mylog.log 2>&1 for everyone one of the other 585 lines.

Then I pasted this into the terminal/cml console: https://gist.github.com/ProGamerGov/f26d8f7adb90c8477b70bf157b1a7a18

Now the trick is to figure how to use the output for labels? Maybe I can append the existing model's weights/content with the art images rather than creating new categories?

Aug 03 '16 23:08 ProGamerGov

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for data/s_art/simon1/image327.jpg ----------
0.8065 - "n03947888 pirate, pirate ship 724"
0.0731 - "n04606251 wreck 913"
0.0189 - "n03388043 fountain 562"
0.0101 - "n01704323 triceratops 51"
0.0074 - "n03240683 drilling platform, offshore rig 540"
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for data/s_art/simon1/image328.jpg ----------
0.0903 - "n01824575 coucal 91"
0.0448 - "n13133613 ear, spike, capitulum 998"
0.0404 - "n12144580 corn 987"
0.0321 - "n01616318 vulture 23"
0.0294 - "n09472597 volcano 980"

That's the output saved in mylog.log. It should be possible to make a script grab the image location and apply the label values. But that might be out of my skill level in this area. It would be interesting to see how the model responds in Neural-Style, even though the labels are not necessarily 100% correct.

Edit:

The full mylog.log file from all the images in the simon1 data set with the Hybrid Places 365 model: https://gist.github.com/ProGamerGov/8d792d6d7fb00167729262931c4089bf

The full mylog.log file from all the images in the simon1 data set with the Regular/Non-Hybrid Places 365 model:

https://gist.github.com/ProGamerGov/5a68492f98e4aa26197ef7bdbdce83a2

Aug 04 '16 00:08 ProGamerGov

So I guess I need to find something I can modify, or figure out how to make a script which can:

Take the data from a file containing 586 of these:

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for data/s_art/simon1/image327.jpg ----------
0.8065 - "n03947888 pirate, pirate ship 724"
0.0731 - "n04606251 wreck 913"
0.0189 - "n03388043 fountain 562"
0.0101 - "n01704323 triceratops 51"
0.0074 - "n03240683 drilling platform, offshore rig 540"
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 559415190
---------- Prediction for data/s_art/simon1/image328.jpg ----------
0.0903 - "n01824575 coucal 91"
0.0448 - "n13133613 ear, spike, capitulum 998"
0.0404 - "n12144580 corn 987"
0.0321 - "n01616318 vulture 23"
0.0294 - "n09472597 volcano 980"

And put it into the structure of:

simon1/image1.jpg 1
simon1/image10.jpg 1
simon1/image100.jpg 1
simon1/image101.jpg 1
simon1/image102.jpg 1
simon1/image103.jpg 1
simon1/image104.jpg 1

Like this, in it's own txt file:

simon1/image327.jpg 724 913 562 51 540
simon1/image328.jpg 91 998 987 23 980

Or I can just take what the network thinks is the most accurate answer:

simon1/image327.jpg 724
simon1/image328.jpg 91

Or we could label the majority, half, or a few of the images by only applying labels that have a high enough accuracy.

Aug 04 '16 00:08 ProGamerGov

Just remember that if the train and val sets do not contain examples for every label, it is likely to result in poor training. That's why I had to write a script to renumber the labels.

On the other, as you are not training it for classification, you can as well try to train with an incomplete label set and see what happens.

About multilabel training, it seems possible with caffe but be prepared for problems, the post I linked gave only some ideas and pointers how to do that.

Aug 04 '16 05:08 htoyryla

Just remember that if the train and val sets do not contain examples for every label, it is likely to result in poor training. That's why I had to write a script to renumber the labels.

@htoyryla Do you still have the script, so that I have something to base what I am trying to accomplish off of?

Aug 04 '16 05:08 ProGamerGov

"@htoyryla Would you mind elaborating more on this?"

Something like in this http://siavashk.github.io/2016/02/22/autoencoder-imagenet/ . In VGG terms one would remove, say, FC7 and FC8 and instead add a mirrored version of the convlayers to rebuild the image. Then train on images, counting the loss between the output and input images. VGG may, however, be difficult to train in this way. And caffe lacks an Unpooling layer needed, although there are caffe extensions that have one.

Generative adversarial networks are a more sophisticated solution. Two models, one produces an image, the other decides whether the image was real or fake. Both are trained in tandem. See for instance https://swarbrickjones.wordpress.com/2016/01/13/enhancing-images-using-deep-convolutional-generative-adversarial-networks-dcgans/ and https://github.com/soumith/dcgan.torch .

What I find attractive about such approaches is that it is possible to train without labels. Labeling is the main pain in creating datasets. Furthermore, labels are for classification, and style transfer is about images, not classification.

There is much recent work and new applications that use this kind networks to directly work with images. Also looks like the recent work on neural style transfer concentrates on such networks.

Aug 04 '16 05:08 htoyryla

I'll check if I can find the script.

Aug 04 '16 05:08 htoyryla

I found those scripts but cannot remember exactly which ones I finally used and how. These are a quick and dirty solution I used to solve a once only task.

This lua script looks like it reads a file containing all valid labels (those that exist in your dataset), each label number on its own line, and then opens val.txt and renumbers the labels from zero to maxlabel-1.

https://gist.github.com/htoyryla/7de83339101524c058da94ba6a176a47

I think the following lua script is what I used for labeling the images using an existing model. It outputs the filenames followed by the label given by the model. It also output the list of all labels that are found, to be given as input to the renumbering script. All this is in the same output stream, separated by a "-------------------" line. Direct all output to a file and then manually copy-paste the relevant areas into a train.txt and valid_labels.txt.

https://gist.github.com/htoyryla/e4fea0efe127b3255ba791f6b4a2b2c6

The renumbering must be done for all images at the same time, and the splitting to train and val sets done later, otherwise the labels will not match. I generated a single all.txt and used the following python script to split it into train and val sets.

https://gist.github.com/htoyryla/fdf83cfd2c511627d02ef21f3d80afb4

Aug 04 '16 06:08 htoyryla

I found this Tensorflow based image classifier here that seems to be extremely easy to setup and use, https://github.com/llSourcell/tensorflow_image_classifier

The Tensorflow model on that Github page has to be trained on the categories that you want it to classify. You can use a browser extension like: https://chrome.google.com/webstore/detail/fatkun-batch-download-ima/nnjjahlikiabnchcpehcpkdeckfgnohf?hl=en, for Chrome, to collect about 100 training images from Google Images for each category.

I am thinking that with a simple script, this easy to use Tensorflow project, could easily be used for labeling images for Caffe training. This is far easier to setup and configure, with image "labeling" for the model being done by placing the images in the appropriate category director that you created, The rest seems pretty much automated.

https://github.com/BVLC/caffe/issues/2051#issuecomment-247765410

In the solver.prototxt, iter_size can be used to compensate for not using the recommended batch size. The default is iter_size: 1.

Sep 15 '16 19:09 ProGamerGov

neural-style neural-style copied to clipboard

Where should I start if I want to train a model for usage with Neural-Style?

neural-style
neural-style copied to clipboard