chainer-fast-neuralstyle
chainer-fast-neuralstyle copied to clipboard
remove the noise from the generated image
When I use the model generate a stylised image,there is something bad I found.
1.Have many regular dots.
Just like the image.In the image,there are many regular square dots which I want to remove.Is there any method?
2.Can't remove the noise no matter how many times I trained.
this image was trained by 5 epochs.I have test the generated image every 20000 iterations,there always exists the noise.I don't know how to remove it.Could anyone help me?
All parameters that I use are image_size 512, tv_lambda 0.001, train_data IMAGENET val dataset.
This two problem was found currently.I am thankful for anyone could help me.Thanks very much.
This is the style image:
@yusuketomoto Could you give me some advice? Thanks very much.
It's a pretty good result. I don't think you'll get better with out more training images... but I'm just learning this as well.
For the patterns, they could be fixed in post processing.
@artnose Thanks for your advice.But I still confused about it.
Did you mean that I should use many different style images to see the result and pick up a best model?Unhappily, I have trained many models with different style images,they all have the 'dots' and noise.
So,is it the code problem? or other advice?
Thanks.
@jackieyung https://github.com/yusuketomoto/chainer-fast-neuralstyle/commit/528f8fb4dc4f1a93af29d583fe75acf6de182c44 might improve the dot noise issue. Please try it.
@yusuketomoto Thanks a lot.I'll try it right now.
I think the noise that you've posted as picture 2 is caused from the network having an output that is so strong that it exceeds the (0,255) color range. The VGG network has been trained to only handle inputs in the (0,255) range. So if you feed 500 into the VGG network, you'll probably get a rather strong response instead of it treating your 500 like 255=white. Similarly, no image can contain a pixel that is blacker than 0=black. So maybe if the model to be trained erroneously generates a -255, +255 pattern, it will cancel out inside the VGG network, but it will get clamped to 0,255 i.e. alternating black and white during export.
So my suggestion to fix this would be that you clamp the output of your model to the (0,255) range during training, so that the VGG network will always get the same values that it would get if your image was read from a file. Due to the -120 mean in the VGG net, you actually need to clamp it to (-120,135).
diff --git a/train.py b/train.py
index 5d3269c..0b75f53 100755
--- a/train.py
+++ b/train.py
@@ -125,6 +125,8 @@ for epoch in range(n_epoch):
x = Variable(x)
y = model(x)
+ y=F.minimum(y,xp.ones(y.shape, dtype=xp.float32)*135);
+ y=F.maximum(y,xp.ones(y.shape, dtype=xp.float32)*-120);
feature = vgg(xc)
feature_hat = vgg(y)
--
And for your noise type 1, I believe that this is the color channels interfering with each other, i.e. a -255,255 in the red channel could cancel out a 255,-255 in the blue channel. For me, the patch from @yusuketomoto fixed that issue :)
@fxtentacle Thanks for your suggestion.But...
For noise type 2,the model generated image which just range in 0 to 255. See the code https://github.com/yusuketomoto/chainer-fast-neuralstyle/blob/master/net.py/#L66 .
For noise type 1,could you give me a link of the patch that fixed the issue?
Thanks a lot.
For type 1 I meant this commit: https://github.com/yusuketomoto/chainer-fast-neuralstyle/commit/528f8fb4dc4f1a93af29d583fe75acf6de182c44 but sadly in my tests, it does remove the dots and instead replaces it with diagonal lines.
For the noise type 2, you are correct. Sorry, I didn't consider that the tanh would limit everything. That said, I'm getting very good results on this front by using a loss function that penalizes values close to the border of RGB range.
The fact that the patch for type 1 noise just made the noise change its shape shows me that there must be a strong pressure for the network to learn high-level neurons which produce a very strong shape given very little input.
Like I just documented in #69, the paper suggests reflection-padding the input and then cropping the model output before VGG evaluation and loss calculation. You probably also noticed that every generated image has repetitive patterns at the border. I believe that is because due to the padding, the recognition part of the model is being fed with 0 values, yet the synthesis part of the model still needs to produce some output that is contextually close to the original image, as judged by VGG.
Or in other words: Around the border, the residual layers will have pretty much only zeros to work with, but the output image still needs to have colors and shapes. So this would be exactly the condition where there is a strong pressure on the network to learn "0 to strong pattern" responses on the higher deconvolution layers, meaning this could cause a learning pressure to circumvent the TV regularization.
I already verified my hypothesis by using many different filter kernels for the TV regularization function, yet in every case the network learns to produce repetitive patterns that circumvent whatever my TV function is penalizing.
So my new working guess is that we'll need to reduce padding and crop the border artifacts away before vgg(y)
. That, of course, also implies cropping xc
so that the size of vgg(xc)
continues to match.
To verify the border guess, I trained a model with cropping and then applied it without cropping. Already after 1100 iterations, the model has completely "un-learned" the features necessary to synthesize the border. I think this confirms that including the border produces an inappropriately strong pressure to learn features that are useless for every other part of the image.
For testing my theory on noise type 2, I artificially introduced 0-values into the model after c3. As expected, very strong patterns show up on the transition from normal values to the 0 spot. So it seems that noise type 2 is caused by areas inside the image having a low activation after c3, which is similar to the activation around the border of the image, which is why we see repetitive patterns not only at the border, but also in the noise type 2 areas. Based on the fact that these patterns always have a repetition length of 4 pixels, I would guess that the cause is before the 2 deconv with stride 2 each.
@fxtentacle @yusuketomoto
I trained a model using the changed total variation code .( https://github.com/yusuketomoto/chainer-fast-neuralstyle/commit/528f8fb4dc4f1a93af29d583fe75acf6de182c44 ) . the stylized image looks much better. The 'dots' noise and overexposure noise seem disappeared.The stylised image just like:
The parameters that I used: tv_lambda = 1e-4 image_size = 512 lr = 1e-3 iteration: 90000
I am doing more experiment and hoping it can get such a good result in different style images.
Thanks very much for your code and suggestions @yusuketomoto
Thanks a lot about your suggestions and experiment @fxtentacle
Wow that's a great result :) I just fell into another gotcha: It seems the current lambda_tv value is dependent on the image size.
@fxtentacle Could you give some sample images about the correlation with the lambda_tv and image_size? I'm extremely curious about it.
Thanks.
Just by the function F.sum(F.convolution_2d(x, W=wh) ** 2) + F.sum(F.convolution_2d(x, W=ww) ** 2)
you can see that the results of the convolution are squared and summed up, but not divided by the image size. Hence, a 512x512 pixel image will only produce 4 times the loss of a 256x256 image for the same noise pattern, if said pattern expands over the entire image.
An alternative solution for noise type 1 by postprocessing: https://github.com/yusuketomoto/chainer-fast-neuralstyle/issues/33#issuecomment-254011634 A bit cumbersome, but doesn't require retraining a model
@jackieyung for that model, which lambda_feat and lambda_style did you use?
@fxtentacle lambda_feat = 1.0, lambda_style = 10.0
@6o6o It was a great job.But you didn't get rid of the 'overexposure' noise.Just like the image's bottom right corner part. Thanks. https://cloud.githubusercontent.com/assets/13799992/19413252/8eb190c4-9330-11e6-88a7-00494cbb5583.png
@jackieyung yes, you are right. This bit is tricky to fix with postprocessing. A more reasonable solution would be eliminating the source of the problem by improving implementation itself.
In addition to the things I reported before, I just also found another cause of noise for very bright or very dark areas: The chainer batch normalization function will always use the current image's mean and variance, which means that if you send in an image that is mostly white but with something in the middle, the result might have "too strong" features, because the variance would be low (due to mostly white) and then everything is upscaled in the residue layers before generating the final image. One possible solution to fix this is to use a different kind of normalization function, or to learn the batch normalization mean and variance during training and then save those values and run the image generation with fixed mean/variance (by passing test=True).
That would make sense because I was trying to learn the style of a black ink - white paper drawing and it never worked
@jackieyung FYI @6o6o posted a great post-processing python script here
@isomerase thanks for your approval, but it's no longer relevant. @yusuketomoto updated the project with resize-convolution technique, which produces much more appealing result at generation time