chainer-fast-neuralstyle icon indicating copy to clipboard operation
chainer-fast-neuralstyle copied to clipboard

About the result

Open austingg opened this issue 8 years ago • 64 comments

First, awesome work, seem the first re-implementation of the paper.

Currently, the result seem a little worse than the vanilla neural style or the result in "perceptual loss" paper. maybe the low tv weights result in the noisy and grid like results. Are all the hyperparameter the same as paper's ?

austingg avatar Apr 12 '16 06:04 austingg

Thanks.

Yes, I know my result is a little block noisy. The model which this repo contains was not trained with the same hyper parameter as paper's. I didn't know the exact value of lambda_feat and lambda_style when I trained the model, because I couldn't found them in the paper. But I found them here, and gonna update repo!

It seemed lambda_style which I used was too large. And I think that uniform lambda_style weight might not be good, because I found that gram_matrix losses are quite different from each layer.

yusuketomoto avatar Apr 14 '16 13:04 yusuketomoto

Yes, there are so many hyper-parameters began from the seminal work "neural algorithm for artistic stylization". Currently most re-implementation follows uniform lambda_style weights. I have done lots for experiments based justin's implementation. Seems your current result only learned the low-level style, e.g. color, no higher-level such as strokes. And in the reddit blog , justin offer a supplementary material for more details. may be helpful for u.

austingg avatar Apr 15 '16 01:04 austingg

@yusuketomoto any news? What weights have you tried?

I tried lambda_feat=1.0 with lambda_style from {1.5, 3.0, 5.0, 7.0, 10.0} with all other weights default. All results are noisy and they have these vertical stripes. How how you managed to train the default starrynight.model that is included in this repo?

dionysio avatar Apr 15 '16 12:04 dionysio

it might be that your convolution kernel size is 4? whereas the original VGG uses 3, which should give more precise gfx.

graphific avatar Apr 24 '16 10:04 graphific

Thx for sharing codes. I trained with default parameters in this repo on mscoco 2014 dataset, tried several style images including starrynight, results are all noisy and weird. Could anyone reproduce the default starrynight.model in this repo?

mionyu avatar May 24 '16 07:05 mionyu

@mionyu I'm getting the same here. I've trained on several pictures and getting unusable results. I'm getting a strange frame affect on all the models , no matter what the source is. I really wanted to use this in an art project next weekend, but unfortunately the results are not acceptable at all. See the examples. Any help is welcomed. I've used the standard command line without altering any parameters. image image

valentinvieriu avatar Jun 27 '16 08:06 valentinvieriu

@valentinvieriu Try batchsize 1, you might get better result. I can't have confidence, but I'm suspecting some bugs are in batch_matmul or style loss computation with batch.

mod_15000

muse_s_15500

yusuketomoto avatar Jun 27 '16 11:06 yusuketomoto

@yusuketomoto Hello.

Thank you for a nice implementation.

I have one question: I keep getting strange spots with random values when I apply the algorithm. Exactly like on the last picture with cat that you have sent on the right side right near the border. Do you know if there is a way to avoid it?

Thank you.

warmspringwinds avatar Jun 28 '16 04:06 warmspringwinds

Thank you @yusuketomoto . The batchsize 1 did help. The image looks more nice and appealing now. image

The border is still there. If you have any idea how to remove that, would be appreciated. Do you think it has anything to do with the training data?

valentinvieriu avatar Jun 28 '16 05:06 valentinvieriu

@yusuketomoto, Thanks for a great implementation here - really enjoy playing with it. Since the features of the style image are being replicated in a pretty granular way, when the input image is larger than style img size, I wondered if you have tried to train with with larger resolution than the 256, 256 that the code resizes it to do.

I guess training will take proportionally longer time then. Just curious if you have experience with doing that? Is there a way to alter the model to create larger features in the output image once its trained?

ghost avatar Jul 05 '16 18:07 ghost

@nikolajspetersen, I'm curious about this too. Furthermore, since the resolution is always 256x256, regardless of the aspect ratio, it tends to distort shapes in training data. Correct me if i'm wrong, or this is irrelevant.

Currently I'm training a model on 512 px style image and center crops of training data of the same size. Will post back comparison images once it's ready.

6o6o avatar Jul 10 '16 18:07 6o6o

Our assumptions were correct. Size does matter. The result is much more appealing. Here's my picture processed in style of Kandinsky painting. Both models were trained with batchsize 1 for 2 epochs.

First uses the default size of 256. Second uses 512 px style and square centered crops of training data, rescaled with bicubic interpolation so that the smallest side is 512. Training took 44 hours on GTX 970.

The downsides are that cropping loses some of the detail, while upscaling blurs the image slightly. I guess you can get rid of cropping and squeeze whatever image it is into square dimensions, don't know if it will turn out in feature distortion though.

ya_kandinsky

ya_crop512

6o6o avatar Jul 11 '16 04:07 6o6o

@6o6o nice work! I'm using gtx 1080 - 8GB ram and try to train with size 512 but get out of memory at "feature_hat = vgg(y)". Can you tell me how to reduce memory size ?

codevui avatar Jul 12 '16 04:07 codevui

@codevui thanks. Check if you have everything right. Basically, just replace all instances of 256 with the desired value and optionally set resampling filter to 2. It shouldn't be using more than 3600 MB. Actually, 8GB should allow you to go all the way up to 720px

6o6o avatar Jul 12 '16 07:07 6o6o

@6o6o nice work. it really helps me a lot. by the way, could you please tell me the exact value of lambda_feat and lambda_style you used?

bucktoothsir avatar Jul 12 '16 07:07 bucktoothsir

@6o6o the problem is cudnn. After i make cudnn correct for chainer, everything is ok! Thank you!

codevui avatar Jul 12 '16 08:07 codevui

@bucktoothsir, glad to help. I haven't touched those values, as I'm not sure about them. Experimenting with trial and error takes too much time. I noticed though, in the paper possible range for lambda_tv is between 1e-4 and 1e-6, while here it's 10e-4. Don't know if it's a typo, or am I missing something. You can try adjusting lambda_tv and lambda_style and post back if anything interesting comes up

6o6o avatar Jul 13 '16 00:07 6o6o

@6o6o would you mind sharing your model?

I am also currently training a new one (256), due to the time intensive model creation, we should consider opening a central repo with some models. What do you think?

gafr avatar Jul 18 '16 10:07 gafr

@gafr, sure. I'm all for it. Would be great if we could collect some good models for everyone to use. Should I just create a new repo? Or where do I put them?

6o6o avatar Jul 19 '16 10:07 6o6o

@6o6o just invited you to a repository, give me 10 minutes for a README and a structure

gafr avatar Jul 19 '16 10:07 gafr

As I found subtracting mean images before image transformation network is not so effective, I'd like to change codes. However this change will break backward compatibility of models. If you create a repository, please note that.

yusuketomoto avatar Jul 19 '16 10:07 yusuketomoto

@yusuketomoto would you give me the permission to collect/share your current model in a repository, or should I just link it to your repository. I will note down the parameters and/or version

gafr avatar Jul 19 '16 10:07 gafr

@gafr I updated the codes, models, readme text. Sorry for the inconvenience!

yusuketomoto avatar Jul 19 '16 12:07 yusuketomoto

Some collected models are now available at https://github.com/gafr/chainer-fast-neuralstyle-models

please mind, all of them are trained with the old version, will update soon.

gafr avatar Jul 19 '16 15:07 gafr

@6o6o The second result was great! Thanks for your work! I trained some models but the result was terrible. Can you help me for some details?

  1. Training by the entire database will spend too much time, Do you have experience in the impact of the number of training images? How many differences between 10000 images and the entire database?
  2. Is "cropping" meaning keep the aspect ratio invariant while resize the smaller side of image to 512, and then square centered crop it? I replaced "256" in "train.py" to "512", Do I need to change the size in "net.py"?
  3. In addition to image preprocessing, other parameters are all default settings?

logic1988 avatar Jul 25 '16 09:07 logic1988

  1. Reducing the dataset to 10k significantly deteriorates the quality. Definitely not recommended.
  2. Scale the images so that the smallest side is of desired value, preserving the aspect ratio, then crop. This is optional. I read that VGG models were trained like this here, so I thought the same technique could be applied to styles. Changes should be done to train.py only.
  3. All parameters at default. You can try increasing --lambda_tv to smth like 10e-3. Lowering it as proposed in paper tends to produce more artifacts.

If you wish I can fork the project and integrate the changes. I don't think PR is a good idea since the enhanced settings take considerably longer to train and may not be suitable for everyone.

6o6o avatar Jul 25 '16 16:07 6o6o

@6o6o Write to me at my email - [email protected] I would like to cooperate with you.

rogozin70 avatar Jul 25 '16 20:07 rogozin70

Anyone imolemented a video version? It should be simple considering the speed of this alghoritm

rayset avatar Jul 26 '16 07:07 rayset

@6o6o Thank you very much! Please fork the project then I can learn from it. Looking forward to more communication.

logic1988 avatar Jul 26 '16 10:07 logic1988

@rayset I've done some experiments. The result is quite steady, preserving consistency between frames for moving objects / transitions, except for some jitter in the backgrounds, areas with subtle color change, compression artifacts, etc. See here and here

6o6o avatar Jul 26 '16 17:07 6o6o