chainer-fast-neuralstyle
chainer-fast-neuralstyle copied to clipboard
About the result
First, awesome work, seem the first re-implementation of the paper.
Currently, the result seem a little worse than the vanilla neural style or the result in "perceptual loss" paper. maybe the low tv weights result in the noisy and grid like results. Are all the hyperparameter the same as paper's ?
Thanks.
Yes, I know my result is a little block noisy. The model which this repo contains was not trained with the same hyper parameter as paper's. I didn't know the exact value of lambda_feat and lambda_style when I trained the model, because I couldn't found them in the paper. But I found them here, and gonna update repo!
It seemed lambda_style which I used was too large. And I think that uniform lambda_style weight might not be good, because I found that gram_matrix losses are quite different from each layer.
Yes, there are so many hyper-parameters began from the seminal work "neural algorithm for artistic stylization". Currently most re-implementation follows uniform lambda_style weights. I have done lots for experiments based justin's implementation. Seems your current result only learned the low-level style, e.g. color, no higher-level such as strokes. And in the reddit blog , justin offer a supplementary material for more details. may be helpful for u.
@yusuketomoto any news? What weights have you tried?
I tried lambda_feat=1.0 with lambda_style from {1.5, 3.0, 5.0, 7.0, 10.0} with all other weights default. All results are noisy and they have these vertical stripes. How how you managed to train the default starrynight.model that is included in this repo?
it might be that your convolution kernel size is 4? whereas the original VGG uses 3, which should give more precise gfx.
Thx for sharing codes. I trained with default parameters in this repo on mscoco 2014 dataset, tried several style images including starrynight, results are all noisy and weird. Could anyone reproduce the default starrynight.model in this repo?
@mionyu I'm getting the same here. I've trained on several pictures and getting unusable results. I'm getting a strange frame affect on all the models , no matter what the source is.
I really wanted to use this in an art project next weekend, but unfortunately the results are not acceptable at all. See the examples. Any help is welcomed. I've used the standard command line without altering any parameters.
@valentinvieriu Try batchsize 1, you might get better result. I can't have confidence, but I'm suspecting some bugs are in batch_matmul or style loss computation with batch.
@yusuketomoto Hello.
Thank you for a nice implementation.
I have one question: I keep getting strange spots with random values when I apply the algorithm. Exactly like on the last picture with cat that you have sent on the right side right near the border. Do you know if there is a way to avoid it?
Thank you.
Thank you @yusuketomoto . The batchsize 1 did help. The image looks more nice and appealing now.
The border is still there. If you have any idea how to remove that, would be appreciated. Do you think it has anything to do with the training data?
@yusuketomoto, Thanks for a great implementation here - really enjoy playing with it. Since the features of the style image are being replicated in a pretty granular way, when the input image is larger than style img size, I wondered if you have tried to train with with larger resolution than the 256, 256 that the code resizes it to do.
I guess training will take proportionally longer time then. Just curious if you have experience with doing that? Is there a way to alter the model to create larger features in the output image once its trained?
@nikolajspetersen, I'm curious about this too. Furthermore, since the resolution is always 256x256, regardless of the aspect ratio, it tends to distort shapes in training data. Correct me if i'm wrong, or this is irrelevant.
Currently I'm training a model on 512 px style image and center crops of training data of the same size. Will post back comparison images once it's ready.
Our assumptions were correct. Size does matter. The result is much more appealing. Here's my picture processed in style of Kandinsky painting. Both models were trained with batchsize 1 for 2 epochs.
First uses the default size of 256. Second uses 512 px style and square centered crops of training data, rescaled with bicubic interpolation so that the smallest side is 512. Training took 44 hours on GTX 970.
The downsides are that cropping loses some of the detail, while upscaling blurs the image slightly. I guess you can get rid of cropping and squeeze whatever image it is into square dimensions, don't know if it will turn out in feature distortion though.
@6o6o nice work! I'm using gtx 1080 - 8GB ram and try to train with size 512 but get out of memory at "feature_hat = vgg(y)". Can you tell me how to reduce memory size ?
@codevui thanks. Check if you have everything right. Basically, just replace all instances of 256 with the desired value and optionally set resampling filter to 2. It shouldn't be using more than 3600 MB. Actually, 8GB should allow you to go all the way up to 720px
@6o6o nice work. it really helps me a lot. by the way, could you please tell me the exact value of lambda_feat and lambda_style you used?
@6o6o the problem is cudnn. After i make cudnn correct for chainer, everything is ok! Thank you!
@bucktoothsir, glad to help. I haven't touched those values, as I'm not sure about them. Experimenting with trial and error takes too much time. I noticed though, in the paper possible range for lambda_tv
is between 1e-4 and 1e-6, while here it's 10e-4. Don't know if it's a typo, or am I missing something. You can try adjusting lambda_tv
and lambda_style
and post back if anything interesting comes up
@6o6o would you mind sharing your model?
I am also currently training a new one (256), due to the time intensive model creation, we should consider opening a central repo with some models. What do you think?
@gafr, sure. I'm all for it. Would be great if we could collect some good models for everyone to use. Should I just create a new repo? Or where do I put them?
@6o6o just invited you to a repository, give me 10 minutes for a README and a structure
As I found subtracting mean images before image transformation network is not so effective, I'd like to change codes. However this change will break backward compatibility of models. If you create a repository, please note that.
@yusuketomoto would you give me the permission to collect/share your current model in a repository, or should I just link it to your repository. I will note down the parameters and/or version
@gafr I updated the codes, models, readme text. Sorry for the inconvenience!
Some collected models are now available at https://github.com/gafr/chainer-fast-neuralstyle-models
please mind, all of them are trained with the old version, will update soon.
@6o6o The second result was great! Thanks for your work! I trained some models but the result was terrible. Can you help me for some details?
- Training by the entire database will spend too much time, Do you have experience in the impact of the number of training images? How many differences between 10000 images and the entire database?
- Is "cropping" meaning keep the aspect ratio invariant while resize the smaller side of image to 512, and then square centered crop it? I replaced "256" in "train.py" to "512", Do I need to change the size in "net.py"?
- In addition to image preprocessing, other parameters are all default settings?
- Reducing the dataset to 10k significantly deteriorates the quality. Definitely not recommended.
- Scale the images so that the smallest side is of desired value, preserving the aspect ratio, then crop. This is optional. I read that VGG models were trained like this here, so I thought the same technique could be applied to styles. Changes should be done to train.py only.
- All parameters at default. You can try increasing
--lambda_tv
to smth like 10e-3. Lowering it as proposed in paper tends to produce more artifacts.
If you wish I can fork the project and integrate the changes. I don't think PR is a good idea since the enhanced settings take considerably longer to train and may not be suitable for everyone.
@6o6o Write to me at my email - [email protected] I would like to cooperate with you.
Anyone imolemented a video version? It should be simple considering the speed of this alghoritm
@6o6o Thank you very much! Please fork the project then I can learn from it. Looking forward to more communication.