neural-style icon indicating copy to clipboard operation
neural-style copied to clipboard

Implementing features from the "Controlling Perceptual Factors in Neural Style Transfer" research paper

Open ProGamerGov opened this issue 8 years ago • 228 comments

I have been trying to implement the features described in the "Controlling Perceptual Factors in Neural Style Transfer" research paper.

The code that used for the research paper can be found here: https://github.com/leongatys/NeuralImageSynthesis

The code from Leon Gatys' NeuralImageSynthesis is written in Lua, and operated with an iPython notebook interface.


So far, my attempts to transfer the features into Neural-Style have failed. Has anyone else had success in transferring the features?

Looking at the code, I think that:

In order to make NeuralImageSynthesis alongside your Neural-Style install, you must replace every instance of /usr/local/torch/install/bin/th with /home/ubuntu/torch/install/bin/th. You must also install hdf5 with luarocks install hdf5, matplotlib with sudo apt-get install python-matplotlib, skimage with sudo apt-get install python-skimage, and scipy with sudo pip install scipy. And of course you need to install and setup jupyter if you want to use the notebooks.

ProGamerGov avatar Feb 10 '17 21:02 ProGamerGov

Ok, I think I have gotten the new -reflectance parameter working, though I don't know what it does: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua

Though it seems to alter the output.

ProGamerGov avatar Feb 11 '17 21:02 ProGamerGov

Multires without -reflectance: https://i.imgur.com/LvpXgaW.png

Multires with -reflectance: https://i.imgur.com/YIiqsOx.png

The -reflectance command increases the GPU usage.

Content image: https://i.imgur.com/sgLtFDi.png

Style image: https://i.imgur.com/PsXIJLM.jpg

ProGamerGov avatar Feb 12 '17 01:02 ProGamerGov

It seems to me that your code inserts the new padding layer after the convolution layer which already has done padding, so that padding is done twice (first with zeroes in nn.SpatialConvolution and the by reflection in nn.SpatialReflectionPadding). It is like first adding an empty border and the another one which acts as if a mirror. It would seem to me that the mirror then only reflects the empty border that was added first.

If you look closely at Gatys' code in https://github.com/leongatys/NeuralImageSynthesis/blob/master/ImageSynthesis.lua#L85-L94 you'll notice that the new padding layer is inserted first, and then the convolution layer without padding.

Your code also increases the size of the layer output, as padding is done twice, which might give size mismatch errors.

htoyryla avatar Feb 12 '17 07:02 htoyryla

In my previous comment, I overlooked the fact that it is possible to change the layer parameters after the layer has been added to the model. Thus the lines https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L140-L141 in fact remove the padding from the already inserted convolution layer, so the double padding does not happen and the size of the output is not changed.

Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution.

htoyryla avatar Feb 12 '17 08:02 htoyryla

@htoyryla

Thus the main difference between your code and Gatys' is that you do padding after the convolution, while the normal practice is to do padding before convolution.

So the reflectance padding works correctly, though I have placed it in the wrong location?

This code here is the convolution: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L131-L142 ?

ProGamerGov avatar Feb 12 '17 20:02 ProGamerGov

And for implementing the masks, Gatys' implementation uses hdf5 files, though Neural-Style does not:

cmd:option('-mask_file', 'path/to/HDF5file', 'Spatial mask to constrain the gradient descent to specific region')

    -- Load mask if specified
    local mask = nil
    if params.mask_file ~= 'path/to/HDF5file' then
        local f = hdf5.open(params.mask_file, 'r')
        mask = f:all()['mask']
        f:close()
        mask = set_datatype(mask, params.gpu)
    end

I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?

ProGamerGov avatar Feb 12 '17 20:02 ProGamerGov

The code you now linked looks better, now the padding is inserted (line #127) before the convolution (line #141). Most of what you have highlighted is NOT the convolution but related to selecting between max and avg pooling. But if you follow the if logic, if the layer is convolution it will be inserted to the model in line 141 of your present code.

I cannot guarantee that it now works but now the padding and convolution come in the correct order.

htoyryla avatar Feb 12 '17 20:02 htoyryla

"I have been trying to figure out how to modify the above code for Neural-Style masks, but non of my attempts to replace the hdf5 requirement have worked thus far. Any ideas?"

The code you cited does not implement any mask functionality, it only loads a mask from an existing hdf5 file.

htoyryla avatar Feb 12 '17 20:02 htoyryla

I ran a quick test with the -reflectance option. The change is not particularly obvious at first glance, but it does appear to cause a change. More testing, and different parameter combinations could be needed to farther understand it's affect on artistic outputs.

On the left is the control test with -reflectance false, and on the right is -reflectance true:

Direct link to the comparison: https://i.imgur.com/YGCOCiu.png

False: https://i.imgur.com/0oQNsxl.png

True: https://i.imgur.com/a7fQTLb.png

Command used:

th neural_style.lua -seed 876 -reflectance -num_iterations 1500 -init image -image_size 640 -print_iter 50 -save_iter 50 -content_image examples/inputs/hoovertowernight.jpg -style_image examples/inputs/starry_night.jpg -backend cudnn -cudnn_autotune

ProGamerGov avatar Feb 12 '17 21:02 ProGamerGov

Are Gatys' Grad related functions different that Neural-Styles? I'm looking for where the style masks come into play. Or should I be looking at different functions for implementing these features like masks?

ProGamerGov avatar Feb 12 '17 22:02 ProGamerGov

From what I can see, luminescence style transfer requires the LUV color space, which unlike YUV, it has no easy to use function in the image library.

Style masks seem to require a modifying deeper levels of the Neural-Style code.


For the independent style_scale control with multiple style images, it seems like we only need a way to disable content loss:

From the research paper:

We initialise the optimisation procedure with the coarse-scale image and omit the content loss entirely, so that the fine-scale texture from the coarse-style image will be fully replaced.

And then a simple sh script similar to multires.sh should do the trick. That runs your style images through Neural-Style first should do the trick, but such a script needs a way to disable content loss.

I am thinking that a parameter like:

cmd:option('-content_loss', true, 'if set to false, content loss will be disabled')

if params.reflectance then 

content loss code

end

@htoyryla Which part of the content loss code should this be implemented on to achieve the desired effect?

https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L461-L497

Or: https://github.com/ProGamerGov/neural-style/blob/master/neural_style.lua#L109

Edit: I figured it out and now the content loss module can be disabled.

Currently testing different parameters alongside the new -content_loss parameter: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31

I edited this part of the neural_style.lua script: https://gist.github.com/ProGamerGov/7f3d2b6656e02a7a4a23071bd0999b31#file-neural_style-lua-L148-L151

Though I think that I need to find a way to transfer the color from the intended content image, to this first Neural-Style run with the two style images. Seeing as -init image includes, content as well, maybe I need to add another new parameter, or maybe using -original_color 1 on step two will solve this problem?

Second Edit:

It seems that -content_layers relu1_1,relu2_1 and the default style layers work the best, Though the research paper only specified layers relu1_1 and relu2_1, not whether you should use those values for content or style layers.

ProGamerGov avatar Feb 12 '17 23:02 ProGamerGov

I must be missing something when trying to replicate the "Naive scale combination" from here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ScaleControl.ipynb

Following the steps on the research paper:


Should result in something like this output that I made running Gatys' iPython code: https://i.imgur.com/boz8PhW.jpg

And the styled style image from his code: https://i.imgur.com/6xEumk0.jpg


But instead I get this:

The styled style image: https://i.imgur.com/30HUeOH.png

And here is the final output: https://i.imgur.com/SWhzMn0.png

I tried this code to create the styled style image: https://gist.github.com/ProGamerGov/53979447d09fe6098d4b00fc8e924109

And then ran:

th neural_style_c.lua -original_colors 1 -output_image out.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out7.png -image_size 640 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune


The final content image: https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_content.jpg

The two style images:

https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_style3.jpg

https://raw.githubusercontent.com/leongatys/NeuralImageSynthesis/master/Images/ControlPaper/fig4_style2.jpg


What am I doing wrong here?

ProGamerGov avatar Feb 13 '17 03:02 ProGamerGov

Ok, so analyzing the styled style image from Gatys' code:

The outputs have the parameters used, and the values used, in the name:

[scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_norm_pad_ptw_1.0E+05]

I think was used to make this: https://i.imgur.com/6xEumk0.jpg


From another experiment using his code:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_Amazing-Nature_3840x2160.jpg_simg_raime.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg

The enlarged version (I think 1 step multires?):

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_raime.jpg_simg_Amazing-Nature_3840x2160.jpg_pt_layer_relu2_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg.filepart


And:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_raime.jpg_simg_Amazing-Nature_3840x2160.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg


The layers used are: relu2_1 and relu4_1

Style weight is: sw_2.0E+08

Content weight is: cw_1.0E+05

The Normalized VGG-19 model is used: model_norm

Not sure what this is: ptw_1.0E+05

Naive Scale mix is the best version, and also the styled style image: naive_scalemix.jpg

Not sure if pt_layer refers to both style_layers and content_layers, or just one of them?

ProGamerGov avatar Feb 13 '17 05:02 ProGamerGov

On the subject of Gram Matrices (Leon Gatys said this would be important for transferring features to Neural-Style):

Neural-Style is normalising the Gram Matrices differently, as it additionally divides by the number of features, when compared with Gatys' code. This means that the style loss weights for the different layers in Neural-Style and Gatys' code are a little different:

In a layer l with n_l = 64 features, a style loss weight of 1 in Neural-Style, is a style loss weight of 1/64^2 in Gatys' code.

ProGamerGov avatar Feb 13 '17 05:02 ProGamerGov

"Neural-Style is normalising the Gram Matrices differently, as it additionally divides by the number of features, when compared with Gatys' code. This means that the style loss weights for the different layers in Neural-Style and Gatys' code are a little different:

In a layer l with n_l = 64 features, a style loss weight of 1 in Neural-Style, is a style loss weight of 1/64^2 in Gatys' code."

I am not familiar with Gatys's code, but what you wrote is confusing. First you say that Neural_style divides the Gram matrix by the number of features, but in your example you don't do this division.

If Gatys' normalizes by 1/C^2 where C is the number of features, it makes sense to me as the size of the Gram matrix is CxC.

In neural_style, the gram matrix is normalized for style loss in the line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 Here, input:nElements() is not C but CxHxW, where C,H,W are the dimensions of the layer to which the Gram matrix is added, so that in practice neural-style ends up with a smaller value for the normalized style loss than 1/C^2.

Dividing instead by self.G:nElements() would implement division by C^2 so if that's what you want, try it.

I don't know if this use of input:nElement() instead of self.G:nElements() here is intentional or an accident. @jcjohnson ?

There has been an earlier discussion about this division but there was nothing on this in particular: https://github.com/jcjohnson/neural-style/issues/90

PS. I checked the corresponding code in fast-neural-style https://github.com/jcjohnson/fast-neural-style/blob/master/fast_neural_style/GramMatrix.lua#L46-L49 which also normalizes the Gram matrix by 1/(CHW), so I guess this is done on purpose. After all, normalizing by 1/C^2 would favor the lower layers too much.

htoyryla avatar Feb 13 '17 08:02 htoyryla

I ran a quick test with the -reflectance option. The change is not particularly obvious at first glance, but it does appear to cause a change.

As padding only means adding a few pixels around the image I wouldn't expect large changes. Mostly this should be visible close to the edges, and indeed there appears to be a difference along the left hand side.

htoyryla avatar Feb 13 '17 08:02 htoyryla

Changing line https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L534 to divide by self.G:nElement(), I ran neural-style with defaults and got this.

outcxc

whereas with the original the resulting image was

outchw

Now, they are obviously different but as the style weight has been effectively increased, we should not read too much into this difference. Anyway, this is worth more testing and the idea of normalizing this way makes intuitively sense to me.

htoyryla avatar Feb 13 '17 09:02 htoyryla

Concerning YUV... I was under the impression that Y is the luminance.

When you want to disable content_loss, why not simply set content_weight to 0?

htoyryla avatar Feb 13 '17 10:02 htoyryla

It looks like the 1/C^2 style normalization favors the lowest layers which have smaller C (64 for conv1 as opposed to 512 for conv5). The original neural-style behavior 1/(CxHxW) penalizes less the higher layers because H and W decrease when going to higher layers.

htoyryla avatar Feb 13 '17 12:02 htoyryla

When you want to disable content_loss, why not simply set content_weight to 0?

I will try that as well later today. I think my settings from before were to different from Gatys' settings.

The other issue is that I think transferring the color from a third image, might be needed, as I would imagine that Gatys' would have used something similar to -original_colors 1 if it were the better solution.

ProGamerGov avatar Feb 13 '17 17:02 ProGamerGov

I think I figure out the style combination:

The styled style image: https://i.imgur.com/G1eZerW.png

This was used to produce the final image:

th neural_style.lua -original_colors 1 -style_weight 10000 -output_image out3.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out1_200.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

And this was used to produce the styled style image:

th neural_style_c.lua -content_weight 0 -style_weight 10000 -output_image out1.png -num_iterations 200 -content_image fig4_style3.jpg -style_image fig4_style1.jpg -image_size 2800 -content_layers relu2_1 -style_layers relu2_1 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune


I wonder if something similar could be accomplished by being able to control the layers each style image uses?


I am unable to produce a larger version like Gatys was able to do, Any larger images seem to be blurry, and the shapes begin to fade. The darkness of Seated Nude seems to make this harder as the dark areas seem to take over areas on the new style image in my experiments.

ProGamerGov avatar Feb 14 '17 00:02 ProGamerGov

A note on 1/C^2 gram matrix normalization: this line also needs to be changed https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua#L553 so that the backward pass too will use the normalized matrix.

This will require quite different weights, like content_weight 1e3 and style_weight 1, it can take some 300 iterations before the image starts really to develop, but to me the results look good. I am talking about plain neural_style with modifed Gram matrix normalization. Haven't really looked deeper into the Gatys project.

htoyryla avatar Feb 14 '17 09:02 htoyryla

ProGamerGov, just a little suggestion: since GPU handling is already implemented in "function setup_gpu(params)" (line 324), maybe it's possible to use that function instead of new "set_datatype(data, gpu)"?

It could make the code more maintainable – in case of any changes someone will have to modify only one function instead of two.

For example: pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype) (see how nn.SpatialAveragePooling(kW, kH, dW, dH):type(dtype) is added in line 136).

Currently I can not test it on GPU, but I can confirm that it does work on CPU.

VaKonS avatar Feb 14 '17 12:02 VaKonS

@VaKonS

I'll take a look. I originally pasted in Gatys GPU handling code at the time because I couldn't get the reflection function to work with this line of code:

pad_layer = set_datatype(pad_layer, params.gpu)

As I couldn't figure out how to use function setup_gpu with the code.

Are you saying to change this line:

https://github.com/ProGamerGov/neural-style/blob/6814479c8ebcc11498b7c123ee2ba7ef9f0fe09f/neural_style.lua#L125

to this:

local pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)

And then delete this line:

pad_layer = set_datatype(pad_layer, params.gpu)

?

ProGamerGov avatar Feb 14 '17 15:02 ProGamerGov

@ProGamerGov, yes. And to delete function set_datatype(data, gpu) at line 611, as it will not be needed anymore.

VaKonS avatar Feb 14 '17 16:02 VaKonS

@VaKonS , I made a version that contains other padding types: https://gist.github.com/ProGamerGov/0e7523e221935442a6a899bdfee033a8

When using -padding, you can try 5 different types of padding: default, reflect, zero, replication, or pad. In my testing, the pad option seems to leave untouched edges on other either side of the image.

Edit: Modified version with htoyryla's suggestions: https://gist.github.com/ProGamerGov/5b9c9f133cfb14cf926ca7b580ea3cc8

The modified version only has two 3 options, default, reflect, or replicate.

ProGamerGov avatar Feb 15 '17 01:02 ProGamerGov

Types 'reflect' and 'replication' make sense, although with the typical padding width = 1 as in VGG19 the result is identical.

Type 'zero' is superfluous as the convolution layer already pads with zeroes.

Type 'pad' only pads in one dimension so it hardly makes sense.

You should read nn documentation when using the nn layers. The nn.Spatial.... layers are meant to work with two-dimensional data like images. nn.Padding provides a lower level access for padding of tensors, you need to specify which dimension, which side, which value, and if one wants to use it to pad an image one needs to apply it several times with different settings.

But frankly, with the 1-pixel padding in VGG there are not so many ways to pad. We should also remember that the main reason for padding in the convolution layers is to get the correct output size. Without padding convolution tends to shrink the size.

htoyryla avatar Feb 15 '17 06:02 htoyryla

The code could also be structured like this (to avoid duplicating code and making the same checks several times). Here I used 'reflect' and 'replicate' as they are shorter, you may prefer 'replication' and 'reflection' as in the layer names. But having one as a verb and the other as a noun is maybe not a good idea.

local is_convolution = (layer_type == 'cudnn.SpatialConvolution' or layer_type == 'nn.SpatialConvolution')   
if is_convolution and params.padding ~= 'default' then
    local padW, padH = layer.padW, layer.padH
    if params.padding == 'reflect' then
        local pad_layer = nn.SpatialReflectionPadding(padW, padW, padH, padH):type(dtype)
    elseif params.padding == 'replicate' then 
        local pad_layer = nn.SpatialReplicationPadding(padW, padW, padH, padH):type(dtype)
    else
        error('Unknown padding type')
   end	
   net:add(pad_layer)
   layer.padW = 0
   layer.padH = 0
end

htoyryla avatar Feb 15 '17 06:02 htoyryla

@htoyryla, reflective padding probably takes pixels starting from 1 pixel distance: [ x-2, x-1, x ] [ x-1, x-2 ]. And replication duplicates the edge: [ x-2, x-1, x ] [ x, x ].

VaKonS avatar Feb 15 '17 14:02 VaKonS

Yes, I just realized that when I did a small test. That explains why it made a difference also with padding of one row/column. The documentation is a bit unclear so I believed reflection would result in [ x-2, x-1, x ] [ x, x-1 ] when it only says 'reflection of the input boundary'. But obviously this is more useful.

htoyryla avatar Feb 15 '17 14:02 htoyryla

I have been trying to get this python script to work for the linear color feature found in Gatys' code here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ScaleControl.ipynb

https://gist.github.com/ProGamerGov/5fc5ef9035edc9a026e41925f733a45c

The idea is that making this feature into a simple python script will be easier and less messy than implementing into neural_style.lua. But I can't figure out the python parameters so that the image is fed into the function properly.

Edit:

Trying to reverse engineer the code that feeds into the function:

https://gist.github.com/ProGamerGov/32b7d68a098f8b0655d71a08eb3ba050

So far it doesn't output the converted images.

ProGamerGov avatar Feb 16 '17 01:02 ProGamerGov

About your first script https://gist.github.com/ProGamerGov/5fc5ef9035edc9a026e41925f733a45c

To make it process the images and save the result you need something like this. You did not pass the images to your function and you did not use the resulting image returned by the function. Remember that the function parameters target_img and source_img are totally separate from the variables with the same names, usually it is a good practice to avoid using the same names for both.

The numpy imports were needed, on the other hand I had to use skimage.io instead of PIL for reading and saving the image, probably they use a different format for the image inside python. Anyway, Gatys used imread() and not Image.open().

This works in principle but the resulting image is probably not what one would expect. It could be that some kind of pre/deprocessing is needed which was not obvious to me (not being familiar with the process you are trying to duplicate).

PS. imread returns an image where the data is between 0 and 255 as integers, while match_color expects 0..1 floats. Thats why the result is not good yet.

import scipy
import h5py
import skimage
import os
from skimage import io,transform,img_as_float
from skimage.io import imread,imsave
from collections import OrderedDict
#from PIL import Image, ImageFilter
import numpy as np
from numpy import eye 
import decimal
#import click

target_img = imread('to.png')
source_img = imread('from.png')

def match_color(target_img, source_img, mode='pca', eps=1e-5):
    ....
    return matched_img

output_img = match_color(target_img, source_img)
imsave('result.png', output_img)

htoyryla avatar Feb 16 '17 07:02 htoyryla

OK, by still changing the two imread lines to

target_img = imread('to.png').astype(float)/256
source_img = imread('from.png').astype(float)/256

from these two images from to

I get this (don't know if this is what is expected but it looks ok)

result

htoyryla avatar Feb 16 '17 16:02 htoyryla

Just noticed that there was already an import for img_as_float so these work as well

target_img = img_as_float(imread('to.png'))
source_img = img_as_float(imread('from.png'))

But anyway, I hope this illustrates that one cannot simply cut and paste code but needs also to examine it and make sure the pieces fit together.

htoyryla avatar Feb 16 '17 17:02 htoyryla

The script seems to work like the outputs Gatys' code produced in the iPython interface now:

The source image:

The target images:

The images I used can be found in Gatys respository here, and in my Imgur album here: https://imgur.com/a/PrKtg.

Before the Gatys' Scale Control code tried to transfer the brush strokes onto the circular pattern image, it created images like these with the linear transfer color function. So I guess the next step is to test how well these modified style images work.

The working script: https://gist.github.com/ProGamerGov/73e6c242abc00777e4e8cf05cf39dc70

This code here:

target_img = img_as_float(imread('to.png'))
source_img = img_as_float(imread('from.png'))

Did not seem to work for me, though that could be a Virtualbox related issue like some ImageMagick scripts can cause.

ProGamerGov avatar Feb 16 '17 17:02 ProGamerGov

If img_as_float does not work, check that you have

from skimage import io,transform,img_as_float

(Just noticed that you have it. Don't know what is going on there if you have skimage installed in your python and can import it.)

And by the way, assuming you want to try all options, you can change the match_color mode and eps like this:

output_img = match_color(target_img, source_img, mode='chol', eps=1e-4)

htoyryla avatar Feb 16 '17 18:02 htoyryla

Python interpreter is useful for testing small things (just like th in lua):

Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import skimage
>>> from skimage import io,transform,img_as_float
>>> from skimage.io import imread,imsave
>>> img_as_float
<function img_as_float at 0x7f3bc9cad230>
>>> img = imread('to.png')
>>> img
array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [133, 119, 112],
        [101,  84,  85],
        [ 54,  45,  44]],
>>> img_as_float(img)
array([[[ 1.        ,  1.        ,  1.        ],
        [ 1.        ,  1.        ,  1.        ],
        [ 1.        ,  1.        ,  1.        ],
        ...,
        [ 0.52156863,  0.46666667,  0.43921569],
        [ 0.39607843,  0.32941176,  0.33333333],
        [ 0.21176471,  0.17647059,  0.17254902]],

htoyryla avatar Feb 16 '17 18:02 htoyryla

I got the script to accept user specified parameters: https://gist.github.com/ProGamerGov/d0917848a728bceb4131272734f61e8b

Only the target and source image are required, but you can also control the eps value and the transfer mode. Though the --eps parameter currently only accepts values in scientific notation.

I also cleaned up the unused lines of code.

I currently testing different parameters for scale control.

ProGamerGov avatar Feb 17 '17 06:02 ProGamerGov

It seems you don't understand how functions work. When one defines a function like match_color one specifies the parameters that are input to the function when it is called.

When one calls the function one gives the actual values for those parameters. One can then call the function as many times as needed with different values.

What you are doing now is defining a function so that the default values of transfer_mode and eps are defined from user input. It works when you only run the function once but it is confusing. That is not the way to pass values into a function.

You should change the def line as it was and add the actual values of transfer_mode and eps to the line where the function is called (like I already suggested).

output_img = match_color(target_img, source_img, mode=transfer_mode, eps=int(float(eps_value)))

BTW, I don't understand the int() for eps... first we give something like 1e-5, then float it and finally int which gives 0. So you limit eps to integer values only? Why the int? Float(eps_value) should be enough to convert the input string into a number.

htoyryla avatar Feb 17 '17 07:02 htoyryla

It seems you don't understand how functions work.

It works when you only run the function once but it is confusing. That is not the way to pass values into a function.

I went for making the code work, without putting a lot of focus on how. Which is a terrible way to go about coding.

You should change the def line as it was and add the actual values of transfer_mode and eps to the line where the function is called (like I already suggested).

Yea, I see that now. Not sure what I was thinking at time when I made such an embarrassing and obvious mistake.

BTW, I don't understand the int() for eps... first we give something like 1e-5, then float it and finally int which gives 0. So you limit eps to integer values only? Why the int?

It was the first that worked, which I now think was because I fixed a bracket placement error. I have removed the integer limitation.

Thanks for helping me correct the issues!

ProGamerGov avatar Feb 17 '17 08:02 ProGamerGov

I think I am getting close to the research paper's results:

Layers relu2_1,relu4_2:

Direct link to full image: https://i.imgur.com/Vo9p96O.png

I know the research paper talks about only using layers relu1_1 and relu2_1, but the fine brush strokes from the paint style image seem to work best with relu2_1 and relu4_2, or just relu4_2, at least with this coarse style image. I'm not sure if I am missing something, or if this is due to a different between Gatys' and JcJohnson's code?

This was my content image: https://i.imgur.com/eoX7f3I.jpg

Control test without scale control:

Screenshot from the research paper:


I used this command to create my "stylemix" image:

 th neural_style.lua -tv_weight 0 -content_weight 0 -style_weight 10000 -output_image out5.png -num_iterations 550 -content_image result.png -style_image result_3.png -image_size 1536 -content_layers relu2_1,relu4_2 -style_layers relu2_1,relu4_2 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

Then I used this two step set of commands to create the final output:

th neural_style.lua -style_weight 10000 -output_image out_final.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out5_pca.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

th neural_style.lua -style_weight 10000 -output_image out_final_hr.png -num_iterations 550 -content_image fig4_content.jpg -init_image out_final.png -style_image out5_pca.png -image_size 1536 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

I used the default linear-color-transfer.py script on my stylemix image before using it to create my final output, so the colors are more vivid than Gatys' version in the research paper. The default linear-color-transfer.py script was also used on both style images before I added the fine style to the coarse style. Both times used the final content image with the city lights, as the source image.

ProGamerGov avatar Feb 17 '17 08:02 ProGamerGov

Can you give the commands how to run the whole process, would like to test.

htoyryla avatar Feb 17 '17 08:02 htoyryla

@htoyryla


Images used:

fig4_content.jpg: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_content.jpg

Fine style: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_style1.jpg

Course style: https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig4_style2.jpg


Step 1:

python linear-color-transfer.py --target_image coarse_style.png --source_image fig4_content.jpg --output_image coarse_pca.png

python linear-color-transfer.py --target_image fine_style.png --source_image fig4_content.jpg --output_image fine_pca.png

Step 2 (Gatys called the output from this step, "stylemix", but I used a generic name from a the list of experiments I was running):

th neural_style.lua -tv_weight 0 -content_weight 0 -style_weight 10000 -output_image out5.png -num_iterations 550 -content_image coarse_pca.png -style_image fine_pca.png -image_size 1536 -content_layers relu2_1,relu4_2 -style_layers relu2_1,relu4_2 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

Step 2.5 (I don't think Gatys' code does this, but I thought it would make the colors look better):

python linear-color-transfer.py --target_image out5.png --source_image fig4_content.jpg --output_image out5_pca.png

Step 3:

Then I tried to mimic Gaty's two step process where the first image is generated at 512px:

th neural_style.lua -style_weight 10000 -output_image out_final.png -num_iterations 1000 -content_image fig4_content.jpg -style_image out5_pca.png -image_size 512 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

th neural_style.lua -style_weight 10000 -output_image out_final_hr.png -num_iterations 550 -content_image fig4_content.jpg -init_image out_final.png -style_image out5_pca.png -image_size 1536 -save_iter 0 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune


Those commands in that order should give you the exact same output as I got.

ProGamerGov avatar Feb 17 '17 08:02 ProGamerGov

After making more tests with different models, I was wrong: the noise is not added by padding. It's a quality of some models: vgg19 from crowsonkb's repository makes clean images with or without padding, and images made with Illustration2Vec, for example, have noisy borders even with default padding.

noise

VaKonS avatar Feb 17 '17 16:02 VaKonS

Examining the outputs produced by ScaleControl.ipynb:


Gatys' Scale Control code produce 3 different outputs, each follows a two step Multires process. I am not sure if these are 3 different ways of doing Scale Control, or if 1 or 2 of them are meant to showcase ways that don't work?

For each of the 3 options in the iPython script, I ran the code and generated the images. Each produced a low resolution 648x405 image and then a 1296x810 "hr" resolution image. Though the image names say that the first image has a resolution of 512px and the second image has a resolution of 1024px, which means there may be some else going on here (maybe downsampling?). I have included both images for each example, and they can be viewed in full in the Imgur link below each example.

Gatys' iPython code names the images with the parameters used to create them, and as such I have included the image file names.

"Stylemix images" are what Gatys calls the resulting combination style image made of both the coarse and fine style images.


Combine 2 images with fine and coarse scale:

low res and hr res: https://imgur.com/a/D7AcK

  • File names:

low res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_org_pad_sw_1.0E+03_cw_1.0E+00.jpg

hr:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_hrpt_layer_relu4_1_sz_512_hrsz_1024_model_org_pad_sw_1.0E+03_cw_1.0E+00.jpg

iPython terminal output: https://gist.github.com/ProGamerGov/a613c42514b9059ebc8230d2c1cd0fd1

norm net:

low res and hr res: https://imgur.com/a/oTB1k

  • File names:

low res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05.jpg

hr res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_hrpt_layer_relu4_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05.jpg

iPython terminal output: https://gist.github.com/ProGamerGov/3d8f8ffdbde5f8ec69c46f3076fa3f2d

Naive scale combination:

low res and hr res: https://imgur.com/a/LbqJQ

  • File names:

low res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg

hr res:

cimg_cm_fig4_content.jpg_scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_hrsz_1024_model_norm_pad_sw_2.0E+08_cw_1.0E+05_naive_scalemix.jpg

iPython terminal output: https://gist.github.com/ProGamerGov/71eda3b16793835bbe142d902c480fe7


The code in addition to creating the two images for each example, also created 4 stylemix images:

Stylemix image: https://i.imgur.com/m7nRgKP.jpg

Name:

scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_norm_pad_ptw_1.0E+05.jpg

Stylemix image: https://i.imgur.com/XPt6N52.jpg

Name:

scimg_fig4_content.jpg_spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_norm_pad_ptw_1.0E+05

Stylemix image: https://i.imgur.com/Vf3mg2n.jpg

Name:

spimg_fig4_style2.jpg_simg_fig4_style3.jpg_hrpt_layer_relu4_1_hrsz_1024_model_org_pad_ptw_1.0E+03.jpg

Stylemix image: https://i.imgur.com/c1ZNDoZ.jpg

Name:

spimg_fig4_style2.jpg_simg_fig4_style3.jpg_pt_layer_relu2_1_sz_512_model_org_pad_ptw_1.0E+03.jpg

I am not sure why there are 3 Examples of Scale Control, and 4 stylemix images. But I assume one of the examples must use 2 stylemix images?

ProGamerGov avatar Feb 18 '17 01:02 ProGamerGov

Ok, so trying both models from Gatys repository which are the normalized VGG19, and the VGG-19 Conv model, I can't seem to get the parameters right. Up until now I was using the default VGG-19 model.

wget -c --no-check-certificate https://bethgelab.org/media/uploads/deeptextures/vgg_normalised.caffemodel
wget -c --no-check-certificate https://bethgelab.org/media/uploads/stylecontrol/VGG_ILSVRC_19_layers_conv.caffemodel

I assume the default Neural-Style VGG-19 prototxt may not work with these models?

wget -c https://gist.githubusercontent.com/ksimonyan/3785162f95cd2d5fee77/raw/bb2b4fe0a9bb0669211cf3d0bc949dfdda173e9e/VGG_ILSVRC_19_layers_deploy.prototxt

Edit: It seems that the models are special versions created by Leon Gatys: https://github.com/jcjohnson/neural-style/issues/7

I don't know why, but I can't seem to get either model to work.

ProGamerGov avatar Feb 18 '17 02:02 ProGamerGov

Using Gatys' weights for Scale Control in Neural-Style seems to work pretty well:

Also, the sym option on the match_color function is for luminescence style transfer.

ProGamerGov avatar Feb 18 '17 03:02 ProGamerGov

@VaKonS Thanks, I'll take a look at that.


@htoyryla I have started trying to extract the python code responsible for luminescence style transfer: https://gist.github.com/ProGamerGov/08c5d25bb867e4313821a45b2e3b2978

As I understand it, the research paper basically describes converting your content/style images to LUV or YIQ, before running them through the style transfer network. In his python code, Gatys appears to use LUV, so I'll start with that.

Testing those 3 functions:

rgb2luv creates this:

luv2rgb creates this:

lum_transform results in this error whenever I try to use it:

ubuntu@ip-Address:~/neural-style$ python lum2.py --input_image fig4_content.jpg
Traceback (most recent call last):
  File "lum2.py", line 47, in <module>
    output_img = lum_transform(input_img)
  File "lum2.py", line 32, in lum_transform
    img = tile(lum[None,:],(3,1)).reshape((3,image.shape[0],image.shape[1]))
NameError: global name 'tile' is not defined
ubuntu@ip-Address:~/neural-style$

I don't know what "tile" is from and I can't figure out whether it belongs to a package, or is related to another specific variable.

Edit: "tile" is part of numpy, and just like "eye" from linear-color-transfer.py, it needs a "np." appended to it's front.

lum_transform creates this:

These functions come from Gatys' Color Control code here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ColourControl.ipynb

I am not sure if lum_transform is needed to perform the LUV change to and from for style transfer.

ProGamerGov avatar Feb 18 '17 04:02 ProGamerGov

Edit: "tile" is part of numpy, and just like "eye" from linear-color-transfer.py, it needs a "np." appended to it's front.

Good you found it. I just woke up and was about to comment :)

htoyryla avatar Feb 18 '17 05:02 htoyryla

How do the Gatys models fail when you try them?

I can load the conv model using the prototxt, both from the links you gave. Also runs fine in neural_style.

th> require "loadcaffe"
{
  load : function: 0x416b6098
  C : userdata: 0x41383dc8
}
th> model_file = "VGG_ILSVRC_19_layers_conv.caffemodel"
                                                                      [0.0001s]
th> proto_file = "VGG_ILSVRC_19_layers_deploy.prototxt"
                                                                      [0.0001s]
th> cnn = loadcaffe.load(proto_file, model_file, "nn")
Successfully loaded VGG_ILSVRC_19_layers_conv.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
                                                                      [0.2703s]
th> cnn
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output]
  (1): nn.SpatialConvolution(3 -> 64, 3x3, 1,1, 1,1)
  (2): nn.ReLU
  (3): nn.SpatialConvolution(64 -> 64, 3x3, 1,1, 1,1)
  (4): nn.ReLU
  (5): nn.SpatialMaxPooling(2x2, 2,2)
  (6): nn.SpatialConvolution(64 -> 128, 3x3, 1,1, 1,1)
  (7): nn.ReLU
  (8): nn.SpatialConvolution(128 -> 128, 3x3, 1,1, 1,1)
  (9): nn.ReLU
  (10): nn.SpatialMaxPooling(2x2, 2,2)
  (11): nn.SpatialConvolution(128 -> 256, 3x3, 1,1, 1,1)
  (12): nn.ReLU
  (13): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (14): nn.ReLU
  (15): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (16): nn.ReLU
  (17): nn.SpatialConvolution(256 -> 256, 3x3, 1,1, 1,1)
  (18): nn.ReLU
  (19): nn.SpatialMaxPooling(2x2, 2,2)
  (20): nn.SpatialConvolution(256 -> 512, 3x3, 1,1, 1,1)
  (21): nn.ReLU
  (22): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (23): nn.ReLU
  (24): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (25): nn.ReLU
  (26): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (27): nn.ReLU
  (28): nn.SpatialMaxPooling(2x2, 2,2)
  (29): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (30): nn.ReLU
  (31): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (32): nn.ReLU
  (33): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (34): nn.ReLU
  (35): nn.SpatialConvolution(512 -> 512, 3x3, 1,1, 1,1)
  (36): nn.ReLU
  (37): nn.SpatialMaxPooling(2x2, 2,2)
}

htoyryla avatar Feb 18 '17 05:02 htoyryla

@htoyryla

How do the Gatys models fail when you try them?

The style loss function does not work (The values basically stay the same) with them for me with any variation of this command for both models:

th neural_style.lua -content_weight 0 -style_weight 10000 -image_size 1024 -output_image out_norm_hr.png -num_iterations 500 -content_image result_2.png -style_image result.png -content_layers relu2_1,relu4_1 -style_layers relu2_1,relu4_1 -model_file models/vgg_normalised.caffemodel -proto_file models/VGG_ILSVRC_19_layers_deploy.prototxt -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

ProGamerGov avatar Feb 18 '17 05:02 ProGamerGov

For me this

th neural_style.lua -model_file ../test/pg/VGG_ILSVRC_19_layers_conv.caffemodel -proto_file ../test/pg/VGG_ILSVRC_19_layers_deploy.prototxt

works on a clean copy of neural-style and produces this image at 400 iterations.

out_400

I see you tried the normalized... I have almost never used the normalized models, they probably require different weight values.

I now tried using -init image -content_weight 0 and it works too.

htoyryla avatar Feb 18 '17 06:02 htoyryla

Normalized works too, but the losses are very small (which I think is typical for a normalized model), and the does not produce the expect result image --probably one would need a much higher style weight.

Iteration 50 / 1000
  Content 1 loss: 0.000000
  Style 1 loss: 4.361065
  Style 2 loss: 0.464805
  Style 3 loss: 0.069460
  Style 4 loss: 0.017796
  Style 5 loss: 0.015553
  Total loss: 4.928679
Iteration 100 / 1000
  Content 1 loss: 0.000000
  Style 1 loss: 3.661895
  Style 2 loss: 0.397874
  Style 3 loss: 0.057522
  Style 4 loss: 0.017153
  Style 5 loss: 0.015032
  Total loss: 4.149477

htoyryla avatar Feb 18 '17 06:02 htoyryla

Looking into where the functions are located in Gatys' code here: https://github.com/leongatys/NeuralImageSynthesis/blob/master/ExampleNotebooks/ColourControl.ipynb

lum_transform seems to come before the style transfer process:

 if cp_mode == 'lum':
        org_content = imgs['content'].copy()
        for cond in conditions:
            imgs[cond] = lum_transform(imgs[cond])
        imgs['style'] -= imgs['style'].mean(0).mean(0)
        imgs['style'] += imgs['content'].mean(0).mean(0)
        for cond in conditions:
            imgs[cond][imgs[cond]<0] = 0
            imgs[cond][imgs[cond]>1] = 1

And then rgb2luv and luv2rgb are used:

#execute script 
    !{script_name}
    output = deprocess(get_torch_output(output_file_name))
    if cp_mode == 'lum':
        org_content = rgb2luv(org_content)
        org_content[:,:,0] = output.mean(2)
        output = luv2rgb(org_content)
        output[output<0] = 0
        output[output>1]=1
    imshow(output);gcf().set_size_inches(8,14);show()
    imsave(result_dir + result_image_name, output)

But I am not sure exactly what the code is doing. I think it's doing something with the style transfer output file, and the input file. Though I haven't been able to get the code that uses rgb2luv and luv2rgb to work properly yet.

ProGamerGov avatar Feb 18 '17 07:02 ProGamerGov

@htoyryla If they are working for you, then it could have been something weird with the instance I was running, or a corrupted/incorrect prototxt file with the same name.

ProGamerGov avatar Feb 18 '17 07:02 ProGamerGov

I downloaded the models and the prototxt from the links you gave, otherwise a fresh copy of neural-style. Maybe you have some modifications in your neural-style? The conv model worked just as usual, the normalized one didn't produce a good image with the default content and style weights, but I've seen that before too.

htoyryla avatar Feb 18 '17 07:02 htoyryla

Difficult to say what a code snippet is doing without being familiar with the whole of it.

But from the paper the color control looks basically simple.

"The modification is simple. The luminance channels LS and LC are first extracted from the style and content images. Then the Neural Style Transfer algorithm is applied to these images to produce an output luminance image Lˆ. Using the YIQ colour space, the colour information of the content image is represented by the I and Q channels; these are combined with Lˆ to produce the final colour output image (Fig. 3(d))."

In other words, one makes luminance-only versions of content and style images, runs them through neural-style and finally applies the color information from the content image to it. I am not too familiar with the these color schemes, but it would be simple to try how this works using YUV and neural-style.

Gatys then goes deeper into using histogram matching for the cases where the histograms of the two images are quite different.

htoyryla avatar Feb 18 '17 08:02 htoyryla

Speaking of "vgg_normalised.caffemodel": together with much higher style weight, it probably requires "normalize_gradients". In above examples with this model, the content weight was 100, style weight 300000, and gradients were "semi-normalized" with scale 0.7.

VaKonS avatar Feb 18 '17 10:02 VaKonS

Related to the discussion above, I made a quick try to run neural-style on luminance (Y) only when -original_colors == 1, then add color (UV) from the content image https://gist.github.com/htoyryla/38a4d6b2280ed5b4e47fc8d67b304f9f

Using my modified code, Gatys VGG19 conv model and neural-style defaults, style transfer with luminance only, followed by transferring color from the content image (Gatys' basic color control)

out_900

For comparison, original_colors == 0 (style transfer with color):

out2_800

For comparison, unmodified neural_style with original_colors == 1 (style transfer with color, followed by transferring color from the content image)

out3_900

Looks like luminance-only style transfer makes a visible difference in the sky.

htoyryla avatar Feb 18 '17 10:02 htoyryla

In above examples with this model, the content weight was 100, style weight 300000, and gradients were "semi-normalized" with scale 0.7.

I can't see which examples you are referring to, but never mind. I was only suggesting that probably there is nothing wrong with the models.

htoyryla avatar Feb 18 '17 10:02 htoyryla

I further tried to add the first histogram adjustment by Gatys (formula 10 in the paper), adjusting the luminance-only style image (before preprocessing) so that its mean and variance match the luminance-only content image:

   local cmean = torch.mean(content_image)
  local cvar = torch.var(content_image)
  for _, img_path in ipairs(style_image_list) do
    local img = image.load(img_path, 3)
    img = image.scale(img, style_size, 'bilinear')
    if params.original_colors == 1 then
	  -- use luminance only
	  img = image.rgb2yuv(img)[{{1, 1}}]
	 -- match historgram
	  local smean = torch.mean(img)
	  local svar = torch.var(img)
	  img = img:add(-smean):mul(cvar/svar):add(cmean)	
    end
    local img_caffe = preprocess(img):float()
    table.insert(style_images_caffe, img_caffe)
  end

I guess something went wrong, but somehow I like how this looks:

out3d

htoyryla avatar Feb 18 '17 12:02 htoyryla

In this version I separated style transfer (color or luminance), histogram matching (none, whole tensor, channel-wise) and original_colors, to allow trying different approaches (for instance doing style transfer with color, histogram matching per channel and finally restoring original colors. Histogram matching is by mean and var only.

Note that I have not really looked at Gatys' code other than some snippets posted here, so this is simply based on reading a page of Gatys' paper. Do not assume that my params work like Gatys'.

-- quick hack to make neural-style do -- 1 either luminance-only or color based style transfer (--transfer lum|color) -- 2 match style histogram to content (--histogram no|all|rgb) -- no: no histogram matching -- all: match whole tensor (use this with --transfer lum) -- rgb: match each channel separately -- 3 restore original colors can be combined with the above -- NOT ALL COMBINATIONS ARE GUARANTEED TO WORK

https://gist.github.com/htoyryla/af9de7a712d74d12f5d3acc7725e6229

PS. I am quite happy with this now. Made this picture from my own portrait and Picasso's Seated nude.

hannu5e

htoyryla avatar Feb 18 '17 14:02 htoyryla

So using this terrible and messy code: https://gist.github.com/ProGamerGov/ba9a9d54bae53e84ebf0116262df6758

I think I have achieved luminescence style transfer without editing neural_style.lua:


Images used:

https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig3_style1.jpg

https://github.com/leongatys/NeuralImageSynthesis/blob/master/Images/ControlPaper/fig3_content.jpg


Step 1:

First you transfer the color from your content image, to your style image like so:

python linear-color-transfer.py --target_image fig3_style1.jpg --source_image fig3_content.jpg --output_image style_colored_pca.png

Then you run that through the lum_transfer.py script like this:

python lum_transfer.py --content_image fig3_content.jpg --style_image style_colored_pca.png --cp_mode lum --output_style_image output_lum_style_pca.png --output_content_image output_lum_style_pca.png

Now you run your content image through lum_transfer.py like this:

python lum_transfer.py --content_image fig3_content.jpg  --style_image fig3_content.jpg --cp_mode lum --output_content_image out_lum_transfer.png

Step 2:

Now you run both newly created images through Neural-Style like this:

th neural_style.lua -original_colors 0 -image_size 1000 -content_weight 1 -style_weight 1e3 -output_image out_lum6_test.png -content_image out_lum_transfer.png -style_image output_lum_style_pca.png -num_iterations 1500 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

Once you've finished with Neural-Style, then you use the lum_transfer.py script again like this:

python lum_transfer.py --output_lum2 out_lum6_test_400_r.png --content_image out_lum6_test_400.png --style_image output_style_colored.png --cp_mode lum2 --output_style_image output_style_2.png --output_content_image output_content_2.png --output_image out_combined.png --org_content fig3_content.jpg

Step 3 (Optional):

Now you can leave your image as is, or you can try to change the colors slightly like this:

python linear-color-transfer.py --target_image out_combined.png --source_image fig3_content.jpg --output_image out_combined_lct_pca.png

python linear-color-transfer.py --mode sym --target_image out_combined.png --source_image fig3_content.jpg --output_image out_combined_lct_sym.png

python linear-color-transfer.py --mode chol --target_image out_combined.png --source_image fig3_content.jpg --output_image out_combined_lct_chol.png

And here are my outputs generated by this process:

The final unmodified output:

--mode pca:

--mode sym:

--mode chol:

Album of the full resolution final images: https://imgur.com/a/AW6qU

Screenshot from the research paper:

Control Output using -original_colors 1 and the unmodified style/content images:

The control output with all 3 modes of linear color transfer using linear-color-transfer.py, for comparison: https://imgur.com/a/82xsu

The unmodified content image:


ProGamerGov avatar Feb 19 '17 03:02 ProGamerGov

Looking at the research paper's examples, and my examples, it seems really difficult to actually see the difference between Luminescence Style Transfer, and normal style transfer.

@htoyryla Your modifications to neural_style.lua created this (Though I used -image_size 649 for this output instead of the -image_size 1000 that I used for the above outputs):

And using linear-color-transfer.py has little to no effect on the output with all 3 modes: https://imgur.com/a/wmNyO

My outputs look closer to the research paper's outputs, but yours seems to have more vivid "light spots".

There are more examples from the research paper, and Gatys' Github code here: https://github.com/ProGamerGov/Neural-Tools/wiki/NeuralImageSynthesis-Color-Control-Examples

The -original_colors 1 parameter in Neural-Style uses YUV, where the "Y" part deals with luminance. So this parameter's effect on our results should be noted. Though I am unclear on what part if any, linear-color-transfer.py plays with luminance.

ProGamerGov avatar Feb 19 '17 06:02 ProGamerGov

The -original_colors 1 parameter in Neural-Style uses YUV, where the "Y" part deals with luminance. So this parameter's effect on our results should be noted.

-original-colors 1 does not affect the style transfer at all, it only adjusts the output image so that luminance comes from the output image and color from the content image.

One of Gatys' proposals is to make style transfer using luminance only and only afterwards add color just like with original-color 1. Note that this is different from doing the style transfer in color and then copying color from the content image.

The modification is simple. The luminance channels LS and LC are first extracted from the style and content images. Then the Neural Style Transfer algorithm is applied to these images to produce an output luminance image Lˆ. Using the YIQ colour space, the colour information of the content image is represented by the I and Q channels; these are combined with Lˆ to produce the final colour output image (Fig. 3(d)).

In my latter version this is done with -transfer lum -original-colors 1.

Gatys then goes on

If there is a substantial mismatch between the luminance histogram of the style and the content image, it can be help- ful to match the histogram of the style luminance channel LS to that of the content image LC before transferring the style. For that we simply match mean and variance of the content luminance.

In my latter version, this is done (although in YUV space) by -transfer lum -histogram all -original-colors 1

I have not implemented the color matching method described in section 5.2 of the paper. However, when he later writes

In comparison, when using full style transfer and colour matching, the output image really consists of strokes which are blotches of paint, not just variations of light and dark.

I wanted to try the something like this although with simple histogram matching, setting -transfer color -histogram rgb -original_colors 1

Note that when doing histogram matching, the change in the levels of the style image may require different style weight settings.

I hope this clarifies the logic behind my code.

htoyryla avatar Feb 19 '17 09:02 htoyryla

@htoyryla, I was referring to those examples. Just a few tests with different styles and models, my mistake was that I didn't test other models at first and made wrong assumptions.

@ProGamerGov, looks awesome! Just to be sure: is line 101 supposed to be "content_img = lum_transform(style_img)"?

In original code there is "imgs[cond] = lum_transform(imgs[cond])", which probably means that content_img should be converted from content_img, not from style_img.

Anyway, results look great!

Second, I think I've found a way to get rid of noise near the borders. It's very simple - to add 1 pixel gray border around the image at the very beginning, right after "local net = nn.Sequential()" and before "nn.TVLoss" layer. Something like:

  if params.padding ~= 'default' then
    print('Padding image with zeroes')
    net:add(nn.SpatialZeroPadding(1, 1, 1, 1):type(dtype))
  end

Results (above are regular versions, below are with gray border, variants with "default-reflected-replicated" paddings):

z1-s4-illustration2vec3242-10-1000-0 57-def-refl-repl z1-b-illustration2vec3242-10-1000-0 57-def-refl-repl

And it doesn't seem to add noise to previously clean images, as far as I can tell:

z1-s2-crowsonkb-5-100-0 0-def-refl-repl

Disadvantage - gray border appears around the edge with some models, so it should be optional:

z1-s5-imagenetlall-10-1000-0 57-def-refl-repl

VaKonS avatar Feb 19 '17 10:02 VaKonS

It looks like there are many variant procedures possible based on Gatys' paper. Like these lines in @ProGamerGov's code https://gist.github.com/ProGamerGov/ba9a9d54bae53e84ebf0116262df6758#file-lum_transfer-py-L98-L103 correspond to luminance only style transfer (assuming the correction by @VaKonS above) but unlike my implementation based on a paragraph in the paper quoted above, here the mean of style image is additionally adjusted to match the content image.

My -histogram option then does that but also adjusts the variance according to formula 10 in the paper.

htoyryla avatar Feb 19 '17 10:02 htoyryla

@VaKonS isn't padding with zeroes actually padding with black pixels. It seems straightforward though to modify SpatialZeroPadding to pad with a given value, like 0.5 for gray.

htoyryla avatar Feb 19 '17 12:02 htoyryla

@htoyryla, nn.SpatialZeroPadding pads with zeroes, but images are converted to mean-centered first, therefore zeroes in processing layers are actually mean values in images, which should be gray pixels in most cases. At least I think so.

upd. Images' mean values in "Neural-style" are not "should be gray", but exactly RGB = 123.68, 116.779, 103.939, if I'm not mistaken.

VaKonS avatar Feb 19 '17 13:02 VaKonS

Oops... my mistake. Of course when the padding is done inside the model the values are as you say. I was thinking in terms of Torch images where the values are between 0 and 1. I've done the same mistake today working on a NoiseMask module which masks part of an image and replaces it by noise (but in that context it is not so critical... actually there is nothing wrong with the module as the mean and std are parameters).

htoyryla avatar Feb 19 '17 15:02 htoyryla

Experimenting with my code, I have been thinking of what Gatys et al wrote: "If there is a substantial mismatch between the luminance histogram of the style and the content image, it can be help- ful to match the histogram of the style luminance channel LS to that of the content image LC before transferring the style."

I think here is an example. This is a style image which was itself created by neural-style from my own materials.

tytto5e

Using it as a style image without histogram matching can result in something like this

koe2a_240

while with histogram matching one can get something like this (not really like the style but still looks better)

koe2b_400

But it is not so obvious when histogram matching works and when not. From the same two images, with different settings, I get this with histogram matching (far too dark)

hannu9a

and now I get quite close to the original style without histogram matching

hannu9b

In these examples I did not use luminance transfer nor original_colors, just histogram on and off. My feeling based on this is that histogram matching does not necessarily help getting the exact look of the original style, but it can help getting a good balance between style and content in some problematic cases. And it can also be used in creating pictures with a strong interesting style which is noticeably different from the original style.

htoyryla avatar Feb 19 '17 17:02 htoyryla

@htoyryla, I thought that histogram matching tricks were to make stylized image use original content colors while keeping some elements from style whose colors were not present in content image.

p. s. It's my mistake with image padding – I shouldn't probably have used "image padding", because it's "layer padding", as you have correctly noticed.

VaKonS avatar Feb 19 '17 19:02 VaKonS

Looking at @ProGamerGov's lum_transfer.py, which if I understand correctly is used to produce an image file for subsequent input into neural_style. In this process it seems to me that the image is stored as an image file (png?) so the question is: is it safe to put, say, LUV image data into a png file and recover the correct data when read into neural-style later. I don't know about png but it seems to me that it might be for greyscale or RGB only.

On the other hand... I cannot say that the whole process using python scripts together with neural-style would be clear to me. Neural-style too expects RGB unless it has been modified. One could, I think, make luminance only transfer by copying L into R, G and B channels before saving image file for neural style (which is what my code does in effect inside neural-style because the model expects RGB channels).

htoyryla avatar Feb 19 '17 19:02 htoyryla

I thought that histogram matching tricks were to make stylized image use original content colors while keeping some elements from style whose colors were not present in content image.

Reading carefully, yes, that's what the paper says. My first example, I think, works exactly that way. In the second example, histogram matching gave too extreme results but leaving it off gave a very good (in my eyes) result preserving even the sense of depth, even if the original coloring is not preserved.

Personally, I lean towards original artistic application of these techniques, and don't look for a single perfect tool for everything, but rather a versatile toolbox. For an example of an interesting result using histogram matching see my cubistic portrait from yesterday. Funny... that too was made from the same photo as these examples today.

htoyryla avatar Feb 19 '17 19:02 htoyryla

So I have now made I refined version of my lum_transfer.py script: https://gist.github.com/ProGamerGov/2e7a0fe7a5ef6e117dc0be81df243331

Now the process only takes 4 commands (including neural_style.lua) to complete:

python linear-color-transfer.py --target_image fig3_style1.jpg --source_image fig3_content.jpg --output_image style_colored_pca.png

python lum_transfer.py --content_image fig3_content.jpg --style_image style_colored_pca.png --cp_mode lum --output_style_image output_lum_style_pca.png --output_content_image output_lum_content_pca.png --org_content fig3_content.jpg 

th neural_style.lua commands here...

python lum_transfer.py --output_lum2 out_lum6_test_400_r.png --cp_mode lum2 --output_image out_combined.png --org_content fig3_content.jpg

For lum_transfer.py, the outputs and required inputs are now dependent on the --cp_mode you choose. Though for --cp_mode lum2 and --output_lum2 there is an issue. One must resize either their Neural-Style output to match their original content image's size, or vice version. Though I would image that in order to preserve the quality of your Neural-Style output, you would want to resize your content image.

Does anyone know how I can resize the content image in the script, to match that of the Neural-Style output image, in the Python code?

Edit: I think Gatys solves this issue like this:

hr_init = img_as_float(scipy.misc.imresize(lr_output, imgs['content'].shape))

Or this code has something to do with it:

for cond in conditions:
        imgs[cond] = img_as_float(imread(img_dirs[cond] + img_names[cond]))
        if imgs[cond].ndim == 2:
            imgs[cond] = tile(imgs[cond][:,:,None],(1,1,3))
        elif imgs[cond].shape[2] == 4:
            imgs[cond] = imgs[cond][:,:,:3]
        try:
            imgs[cond] = transform.pyramid_reduce(imgs[cond], sqrt(float(imgs[cond][:,:,0].size) / img_size**2))
        except:
            print('no downsampling: ' + img_names[cond])
        imshow(imgs[cond]);show()

Edit, this does not work:

import scipy

org_content = scipy.misc.imresize(output, org_content.shape)

ProGamerGov avatar Feb 19 '17 20:02 ProGamerGov

@ProGamerGov, here, for example, images are transformed like this:

from skimage import io, transform im = io.imread(input_name) im = transform.resize(im, (im.shape[0]*scale, im.shape[1]*scale), order=3) io.imsave(output_name, im)

upd. This also works for me (note that "transform" must be imported from skimage, it seems, or scipy will not find its "misc" part):

import scipy
from skimage import io, transform
im = io.imread(input_name)
im = scipy.misc.imresize(im, (im.shape[0]*2, im.shape[1]*2))
io.imsave(output_name, im)

VaKonS avatar Feb 20 '17 00:02 VaKonS

@VaKonS This is the specific part of the script in which the resizing needs to take place:

elif cp_mode == 'lum2':
            output = args.output_lum2
            org_content = args.org_content
	    org_content = imread(org_content).astype(float)/256
	    output = imread(output).astype(float)/256

            #Resize "org_content" to match the size of "output".

            org_content = rgb2luv(org_content)
            org_content[:,:,0] = output.mean(2)
            output = luv2rgb(org_content)
            output[output<0] = 0
            output[output>1]=1
	    imsave(output_a_name, output)	

Do I need to read both images before they are converted to the required float value?

ProGamerGov avatar Feb 20 '17 01:02 ProGamerGov

This is the control image with no modifications:

This the control image run through the last lum_transfer.py step:

This is the output I made following the steps I have outlined above:

This one is the same one as the above image, except for the fact that I swapped the content and style images for the first step, so linear-color-transfer.py transferred the color from the style image, to the content image:

I am not sure if it's the content and style images that I am using, but the luminance changes seem very subtle to me.

The full images can be viewed here: https://imgur.com/a/1k6nk

Edit:

This is what the output looks like when you skip the first linear-color-transfer.py step:

ProGamerGov avatar Feb 20 '17 01:02 ProGamerGov

Maybe it is just the style image's luminance?

Following the luminance transfer steps on this new style image, results in this:

Control Test with the last lum_transfer step and -original_colors 1:

Looking at the images side by side (control on the right):

The difference is much more apparent with this style image. For example, the lighting on the grass is different between the two images

Full versions of these images can be found here: https://imgur.com/a/5daEM

Here's the comparison with linear-color-transfer.py --mode pca:

ProGamerGov avatar Feb 20 '17 02:02 ProGamerGov

@ProGamerGov, scipy.misc.imresize seems to automatically convert output to array of [0...255] integers, therefore if you need floats, you'll have to resize image first, and then convert if to ".astype(float)/256".

On the contrary, it looks like skimage.transform.resize accepts and returns array of floats [0...1], so with it you'll need to convert to float/256 at first, and after that resize.

p. s. Shouldn't image arrays be divided by 255, by the way?

VaKonS avatar Feb 20 '17 03:02 VaKonS

Theses are the results of my luminance tests for iteration count, and -image_size:

Full image here (with labels): https://i.imgur.com/xLLukch.png Full image here (without labels): https://i.imgur.com/DY4dvt9.png From this album of comparison images: https://imgur.com/a/eoYMf

In this set of comparison images, the effects of luminance transfer/color control, is more visible. I used a variation of this neural_style.lua command (-original colors 1 for the control tests):

th neural_style.lua -original_colors 0 -image_size 1000 -content_weight 1 -style_weight 1e3 -output_image out_lum8_test.png -content_image output_lum_content_pca.png -style_image output_lum_style_pca.png -num_iterations 1500 -save_iter 50 -print_iter 50 -seed 876 -init image -backend cudnn -cudnn_autotune

ProGamerGov avatar Feb 20 '17 06:02 ProGamerGov

@VaKonS Thanks for the help, using skimage.transform.resize did the trick:

org_content = skimage.transform.resize(org_content, output.shape)

https://gist.github.com/ProGamerGov/2e7a0fe7a5ef6e117dc0be81df243331

The script now doesn't require the user to do any manual resizing.

ProGamerGov avatar Feb 20 '17 07:02 ProGamerGov

I repeat my question about @ProGamerGov's process, as it seems to me that in this process which is mixing python scripts with neural-style, in order to do luminance only transfer, a python script will produce the luminance-only copies of the images which are then supposed to undergo style transfer in plain neural style. There are some problems in this process as I see it and I am not sure if they have been addressed properly.

How do we get these luminance only images into neural style?

How do we get neural-style to understand that the input is luminance-only and not RGB, and further, how does neural-style process luminance-only data when the model expects three channels (RGB).

If we just save the output of luminance conversion into a png file, then an unmodified neural-style will read it assuming RGB and the result will be something quite different from luminance-only transfer.

The simple solution would be, I think, to save the luminance-only images as RGB so that the L channel is copied, possible scaled, into R,G and B channels. This is what I do in my modified neural-style.

PS. As to my earlier note wondering whether png can be used to contain yuv data instead of rgb (assuming that the receiving end knows that the contents are yuv instead of rgb), I made a test converting to yuv, saving into png, reading back, converting into rgb and saving again. There was an easily noticeable color shift.

orig

final

htoyryla avatar Feb 20 '17 17:02 htoyryla

@htoyryla

python scripts with neural-style, in order to do luminance only transfer

Looking at the naming of the feature in Gatys' code, he calls it "Color Control", which leads me to believe it's not just about the luminance transfer. His code does not produce the same outputs as that of the research paper, which makes me wonder if the changes are intentional as the code seems to produce the same outputs for the other features in the research paper.

For instance, when comparing the results of your neural_style.lua modifications, to his results, it's like the only difference is that he replaced the color white, with the color yellow:

Results with your code:

From the research paper:

As to my earlier note wondering whether png can be used to contain yuv data instead of rgb (assuming that the receiving end knows that the contents are yuv instead of rgb), I made a test converting to yuv, saving into png, reading back, converting into rgb and saving again. There was an easily noticeable color shift.

So would these results indicate that png can be used to contain yuv data instead of rgb?

ProGamerGov avatar Feb 20 '17 19:02 ProGamerGov

So would these results indicate that png can be used to contain yuv data instead of rgb?

My result shows that using png to contain YUV changes the color content. The color nuances in the original image have been replaced by shades of pink when the image has passed from rgb -> yuv -> png -> read as yuv -> rgb.

But using png to carry YUV content is not the main issue, there is also more serious issue that neural-style and VGG are designed to work with RGB, not YUV (to which my solution is converting Y to RGB by setting R,G,B = Y).

You seem to suggest that Gatys' code implements some other scheme than described in the paper. It can well be, but I am not convinced that your approach duplicates his thinking, if you cannot even explain how your process handles the problems I mentioned.

After all, I was just raising some valid questions about the process, and to me nicely looking results are not a proof that the process is correctly implemented. Making correctly working software requires an understanding of both the process and all details of the implementation. If the implementation is divided between separate programs, one needs to understand how to make them working correctly together and passing information correctly between them. For instance, in this particular case one needs to be aware in which format the image is being processed at any point in the process.

It seems to me that I am wasting my time now, so I'll most likely leave this thread.

htoyryla avatar Feb 20 '17 19:02 htoyryla

@htoyryla, @ProGamerGov, if I'm not mistaken, then the code actually does not save YUV values in PNGs. line 126: "output = luv2rgb(org_content)" converts YUV colorspace back to RGB before saving.

VaKonS avatar Feb 20 '17 23:02 VaKonS

I think that at least two functions from "Controlling Perceptual Factors in Neural Style Transfer" can already be used in Neural-style, improving the quality of results and not overcomplicating it:

  • luminance-matched style transfer with original colors (like if artist drew a picture in his unique manner, but used original-colored paints) – it can be made as extended "-original_colors" option;
  • padding option, allowing to improve the quality of edges in rendered image.

Though both need to be polished, tested and reimplemented in Torch to use in Neural-style.

VaKonS avatar Feb 20 '17 23:02 VaKonS

@VaKonS Yes, the code converts the custom gray scale images back to the RGB after working with them in the LUV colorspace. I am currently finalizing my analysis of how the scripts work in relation to Gatys' examples, and Neural-Style's code.

ProGamerGov avatar Feb 21 '17 00:02 ProGamerGov

You seem to suggest that Gatys' code implements some other scheme than described in the paper. It can well be, but I am not convinced that your approach duplicates his thinking, if you cannot even explain how your process handles the problems I mentioned.

@htoyryla Sorry, I should have dug into the inner workings of the how, and why of the script far sooner. My bad.


**The following assumes that I have correctly implemented Gatys' code in both scripts. I used ImageMagick to both analyze the inputs before they were run through the scripts, and the outputs after they were run through the scripts: **

The linear-color-transfer.py script makes the RGB channel means, standard deviations, and overall mean/standard deviation, extremely close to that of values of your chosen source image. Min, max, and gamma remain unaffected. With this in mind, it does not seem like this script is crucial to the function of the luminance transfer process. Though this script does seem make the style image's colors work better for style transfer, as seen above in the experiments with Picasso's Seated Nude artwork.

As for the lum-transfer.py script, things are more complicated than it first appears:

There are many different methods and meanings of the word "grayscale", but I think that Gatys is using a custom method the focuses on luminance. Grayscale images only contain shades of grey, and no color. Each pixel has a "luminance" value. Other names for this "luminance" value, are "intensity", and "brightness".

This means that a grayscale in RGB form, represents the luminance of the image. So one can run images representing luminance through Neural-Style. So the scripts always output in the RGB color space, even though they are working with both a grey scale color space and the LUV color space.

I can break the lum-transfer.py and the linear-transfer-color.py code by giving either script a grey scale image with every grey scale algorithm I have tried. This is because Gatys is not using the rec709luma algorithm, or any other algorithm known to ImageMagick. ImageMagick is listed the script's gray scale output in RGB form has it's intensity value listed as, "Undefined" while the intensity value for my control test is listed as "rec709luma".

The lum-transfer.py script's lum mode first converts your input image that's in the RGB color space, to a grey scale format using the grey scale format as way to preserve/focus on the luminance of the input image.

The lum-transfer.py script's lum2 mode takes the RGB output from Neural-Style composed of the two grey scale images you ran through Neural-Style. It also takes your original RGB content image. Your original content image is then converted from the RGB color space to the LUV color space. This RGB to LUV conversion is what the rgb2luv function does. After this, the script the original content image in LUV form, and applies it to the RGB gray scale output image. This is what the luv2rgb function does. The luv2rgb function basically uses the luminance from the LUV color space, to apply the colors onto the grey scale luminance output image.

In summary, the Python scripts I have made, and Gatys' code, are using a gray scale color space to represent luminance, in the RGB color space to bypass the limitations of the pre-trained caffemodel. In order to add the color back to the gray scale luminance output, the LUV color space is used along with the original content image.

Now, the question is are the differences between the Python script aided outputs, and Gatys' outputs because of the neural_style.lua parameters I used, or am I missing a step in the luminance transfer process?


I am still learning how histograms work, so I don't know exactly where, if at all, does histogram matching play a role in what I described above. I also may have missed some things, so please feel free to let me know where I messed up.

ProGamerGov avatar Feb 21 '17 00:02 ProGamerGov

Comparing what I have learned/figured out, to what the research paper says, I think the linear-color-transfer.py script is for histogram matching:

If there is a substantial mismatch between the luminance histogram of the style and the content image, it can be helpful to match the histogram of the style luminance channel LS to that of the content image LC before transferring the style. For that we simply match mean and variance of the content luminance.

I think that the lum mode on the lum-transfer.py Python script is responsible for this luminance histogram matching described by the research paper.

For the linear-color-transfer.py Python script, this the relevant part of the research paper:

The one choice to be made is the colour transfer procedure. There are many colour transformation algorithms to choose from; see [5] for a survey. Here we use linear methods, which are simple and effective for colour style transfer. Given the style image, each RGB pixel is transformed as:

where A is a 3 × 3 matrix and b is a 3-vector. This transformation is chosen so that the mean and covariance of the RGB values in the new style image match those of [11] (Appendix B). In general, we find that the colour matching method works reasonably well with Neural Style Transfer (Fig. 3(e)),

Then from the comparison between the two methods:

The colour-matching method is naturally limited by how well the colour transfer from the content image onto the style image works. The colour distribution often cannot be matched perfectly, leading to a mismatch between the colours of the output image and that of the content image.

In contrast, the luminance-only transfer method preserves the colours of the content image perfectly. However, dependencies between the luminance and the colour channels are lost in the output image. While we found that this is usually very difficult to spot, it can be a problem for styles with prominent brushstrokes since a single brushstroke can change colour in an unnatural way. In comparison, when using full style transfer and colour matching, the output image really consists of strokes which are blotches of paint, not just variations of light and dark. For a more detailed discussion of colour preservation in Neural Style Transfer we refer the reader to a technical report in the Supplementary Material, section 2.1.

I am not sure where to find the supplementary material?

Edit: http://bethgelab.org/media/uploads/stylecontrol/supplement/

Second Edit: The supplementary material contains all the raw images that were used in the research paper. This in addition to a lot more examples and details.

There also appears to be a tool that's made to examine what the different layers of a style image look like. This appears to be what the -aesth_input command in his code, is for.

ProGamerGov avatar Feb 21 '17 01:02 ProGamerGov

@VaKonS I was mainly looking at lum_transform when cp_mode = lum which I understood to correspond to luminance-only transfer (with the addition of adjusting the mean). But must say I have not had time to look deep enough. Just raised a concern about an additional complexity when using this kind of mixed approach.

Finally looked more deeply... lum_transform produces a monochrome image so there is no problem, neural-style will internally make a 3xHxW tensor in which all channels are copies of the monochrome image. It is just that neural_style will not be able to restore original_colors because it only gets monochrome images, but of course that can be solved by adding another script into the pipeline.

I apologize for commenting... I was just following my former R&D leader role instincts when this started growing more and more complex. I'll ignore this thread from now on... haven't got the time to follow closely enough and the structure of the solution isn't making it any easier.

My concern was thus limited to program implementation, not to theoretical issues concerning different color spaces. This is because in my view the program needs to be solidly implemented before there is any point starting exploring further improvements on more theoretical level.

I think this all comes down to my preference for neural-style, torch and lua, as well as simplicity and clarity, as a basis for development.

htoyryla avatar Feb 21 '17 04:02 htoyryla

Pulling images directly from Gatys' code (The iPython notebook example), the content images before the style transfer process are exactly the same as as the ones made by --cp_mode lum:

These are the outputs from from the lum-transfer.py script for comparison (ignore the size difference mistake):

After style transfer, you get pretty much the same output as with Neural-Style:

For adding the color to the output image, lum-transfer.py, and Gatys' code both do the exact same (Script on the left, Gatys' iPython notebook example on the right):

Full image size album here: https://imgur.com/a/HSdcj

Seeing these outputs, I do believe I have replicated Gatys' luminance transfer code in Neural-Style by using the lum-transfer.py script. The differences between his outputs and mine, arise from both differences in the style transfer parameters, and because he uses a two step process.

@htoyryla I think I have with the best of my knowledge, explained how the process works, and now I have proven that there are no missing steps from the process. So I think it is safe to say that I have gotten luminance transfer working in Neural-Style. Though I'll have to play around with your luminance modifications for neural_style.lua some more.

ProGamerGov avatar Feb 21 '17 08:02 ProGamerGov

My problem is that with the limited time I have, I cannot follow your process as it is evolving. Like when you now say "after style transfer, you get pretty much the same output as with Neural-Style", I am confused because I believed your process used neural-style for style-transfer proper (and which was the basis for my recent concerns).

Therefore, no point in my continuing to participate. The early part of the thread gave me important impulses but now I think my participation is counterproductive for all of us.

PS.

are using a gray scale color space to represent luminance, in the RGB color space to bypass the limitations of the pre-trained caffemodel. In order to add the color back to the gray scale luminance output, the LUV color space is used

That is pretty much what my neural-version does, too, with the steps inserted into the appropriate places in neural-style. Perhaps it is only that the way you have arranged everything does not work well with my intuition. Like running the same script with cryptic option names to do different things in different parts of the process. That I am not so familiar with numpy does not help either.

The part "limitations of the pre-trained caffemodel" is really not fair. Any trained model has to be based on some constraints on the format of the data, RGB being the most natural choice in my view, and monochrome (greyscale) image can easily be represented in RGB just like we both have done. It was only that my not being fluent in reading numpy made me miss that you too had done so.

htoyryla avatar Feb 21 '17 08:02 htoyryla

@htoyryla

Like when you now say "after style transfer, you get pretty much the same output as with Neural-Style", I am confused because I believed your process used neural-style for style-transfer proper.

Sorry, my bad. I was referring to running Gatys' code from the iPython notebook example and how it compared with doing the same in Neural-Style + the Python scripts.

Therefore, no point in my continuing to participate. The early part of the thread gave me important impulses but now I think my participation is counterproductive for all of us.

The luminance stuff is pretty much done, but the next feature (and final feature I think?) is called Spatial Control, and it looks as though I need to modify the deeper levels of neural_style.lua, which you know far better than I do, due to your own experimentation with the code. I previously thought I could use some simple Python code for masks, but upon examining the supplementary material for the research paper, it appears I cannot do things in Python.

ProGamerGov avatar Feb 21 '17 08:02 ProGamerGov

Using masks in torch based style processing is of interest to me and it would be quite simple to add mask before the gram matrix. But full spatial control, as I understand it, would require the use of multiple gram matrices each with its own mask. I faintly recall that Gatys put the multiple gram matrices in a tensor adding just one more dimension, which sounds simple enough in principle. That also feels interesting for my own purposes but not something to do in one hour, so it'll have to wait.

Using masks for spatial control is also not so easy if you have to make the masks manually. I know because I have used neural-doodle for semantic style transfer. To be really useful, one would need an automatic solution for creating the masks (belonging to the general category of image segmentation). I had a look at one promising solution, but the code was in matlab; the model was caffe but used custom layers not recognized by loadcaffe, using it would have required using a custom build of caffe to run the model for making the masks, so I didn't go further with it.

htoyryla avatar Feb 21 '17 08:02 htoyryla

@ProGamerGov, @htoyryla, I think I've made "match_color" in Torch. In case if someone needs it. https://github.com/VaKonS/neural-style/blob/7682e2e24b650206afeddc576a6ca0778d11260c/match_colors.lua

Results seem to be byte-idenctical to ProGamerGov's script.

Does anybody knows how to remove commits from pull request with browser interface?

VaKonS avatar Feb 21 '17 21:02 VaKonS

This might be of interest https://arxiv.org/pdf/1701.08893v1.pdf

htoyryla avatar Feb 22 '17 06:02 htoyryla

Assuming that match_color is meant to adjust the style image before style transfer (which will then be in color), I added it to my code: https://gist.github.com/htoyryla/9ee49c5ff38dda7d0907b6878c171974

Use parameter -histogram matchcolor (transfer should be left to color and originalcolors I believe shall be off).

This expects to find match_color in a file match_colors.lua in the same directory, use @VaKonS's code but remove everything else than the function.

This should also work with multiple style images but I have not tested it.

Here's a sample output using neural-style default and Gatys' VGG_ILSVRC_19_layers_conv.caffemodel

out

Note to myself: If both -histogram matchcolor and -transfer lum are set, the images are first modified to greyscale and the color matched, which probably fails because of tensor size mismatch. It might be more interesting to change this order, so that color matching is done first and then -transfer lum would perform a luminance-only style transfer. Maybe one could use match_color also as an optional method for originalcolor.

htoyryla avatar Feb 22 '17 19:02 htoyryla

@ProGamerGov , saw in email about your attempt to add loss module that Gatys used. It looks like Gatys' code is based on original neural-style where the loss models get the targets as input. The new neural-style code (updated in December 2016 I think) works differently, the loss module must implement two modes:

  • 'capture' in which it captures the target by its own
  • 'loss' in which it evaluates loss

So what you were trying to do might with good luck work in the pre-12/2017 code.

htoyryla avatar Feb 23 '17 06:02 htoyryla