chainer-fast-neuralstyle icon indicating copy to clipboard operation
chainer-fast-neuralstyle copied to clipboard

removing the 'dots'

Open Heartsie opened this issue 9 years ago • 69 comments

Has anyone had any luck removing the dots from the final image? I have found that this method can produce some very cool results: karya-oil_04_1_51000

But the results are really kind of unusable with these dot like artifacts everywhere. Does anyone know if there is a way to train a style without this "dithering"?

Heartsie avatar Aug 09 '16 13:08 Heartsie

I'm creating checkpoint models and I select afterward the best model I can find, but yeah there's always theses dots... I'm also producing 4K images for video, so the dots will be hidden in the mp4 encoding artefacts.

Did you try https://github.com/DmitryUlyanov/texture_nets ? I think there are no dots in this implementation

ttoinou avatar Aug 09 '16 19:08 ttoinou

I have tried it! It is great in some ways, but t has its own issues: https://github.com/DmitryUlyanov/texture_nets/issues/28

The problem with 4k is that the "brush size" would be super small wouldn't it? Is there a way to scale up the pattern without having to train with a super huge style image? Also I assume you have to do 4k in CPU right? That must be pretty slow.

Heartsie avatar Aug 09 '16 22:08 Heartsie

Yes you're right (brush size impossible to change, huge training) but I'm ready to let my computer crunch data for 40 hours straight ^^ . I'm creating videos for a music album.

I'm dividing the 4K image into 9 or 16 images, processing them with the GPU and then joining them with the CPU. Joining is slow because I'm not a good pythoner, but I'm looking into multiprocessing right now

ttoinou avatar Aug 09 '16 22:08 ttoinou

@Heartsie a quick hack would be to upscale the image about 1.2 times, do the transformation and then downscale back to back to original size. Downscaling will blend the dots together, but may produce slightly smaller features.

I'm also curious about upscaling feature size without retraining. I assume this might be possible, though I'm not an ML expert to say for sure.

6o6o avatar Aug 09 '16 23:08 6o6o

fwiw Im seeing those same sorts of dots in a lot of my work with https://github.com/DmitryUlyanov/texture_nets

chrisnovello avatar Aug 09 '16 23:08 chrisnovello

I have found that texture_nets is slightly better with the artifacts, and the training is a LOT faster. But unfortunately the artistic quality of the image just isn't there. It feels more like a grid-like filter than a actual painting or something.

Are the dots static from image to image? If not we can train a solid grey style image, then use it to make a stylized image that is composited of only dots. Then those dots can be subtracted from the artistic images to remove the static patterns.

If the dots move from each generation then maybe a image can be generated 5 times and then they could all be averaged together.

On Tue, Aug 9, 2016 at 7:12 PM, Chris Novello [email protected] wrote:

fwiw Im seeing those same sorts of dots in a lot of my work with https://github.com/DmitryUlyanov/texture_nets

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yusuketomoto/chainer-fast-neuralstyle/issues/33#issuecomment-238720662, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqpIo93QZoQYJZy99WTH5D0Imh69fEeks5qeQlzgaJpZM4JgEw9 .

-=Robin Graham Postrgram.com

Heartsie avatar Aug 09 '16 23:08 Heartsie

They depend on many factors, the style image, the training data. Sometimes they may be barely visible, and quite pronounced in other cases. I also noticed they tend to fade away with additional training, the 4-th epoch produces a more smooth image that 2-nd.

6o6o avatar Aug 10 '16 00:08 6o6o

4th! That must take forever. How many images do you train with? The full coco?

Heartsie avatar Aug 10 '16 00:08 Heartsie

Yep ) I don't do it regularly, just once, for testing only. Here's a few samples. This style had a particularly pronounced pattern, but most of the times it's not worth the effort. ya_edtaonisl_1_crop ya_edtaonisl_3_crop

6o6o avatar Aug 10 '16 00:08 6o6o

Wow, how long did that take? I have a 1070 and I bet that would take me 2 days with a 512 size style image

Heartsie avatar Aug 10 '16 02:08 Heartsie

4 days. 1070 should be twice as fast out of the box. Plus, theoretically you could double that speed by leveraging fp16 datatype, so 4 times faster total, though I'm not sure it's widely supported yet, you may need to tweak the code

6o6o avatar Aug 10 '16 02:08 6o6o

I thought FP16 was just 2x more memory and that the operations were still floating ops. I'll have to test this 4 epoch thing. From what I've seen so far, the models is never stable and doesn't really get better when training too far.

ttoinou avatar Aug 10 '16 08:08 ttoinou

Only do it if you're not satisfied with your current results. Usually it hardly gets any better and sometimes lower epochs may look even superior.

UPD: NVIDIA turned out to be greedy bastards, artificially limiting FP16 compute performance on 10xx series of consumer cards. You'd have to go for a higher-end card based on GP100 chip, namely Tesla or Titan XP, to benefit from speed boost. This is common practice among big companies, intentionally crippling lower-end products in order not to hurt the higher-end sales. I hope AMD will kick their ass, though this is unlikely since they have a poor software support and OpenCL isn't as efficient as CUDA at the moment.

6o6o avatar Aug 10 '16 12:08 6o6o

Are the dots static from image to image?

I'm producing a model each 2000 images used to train, this gives an animation with generate.py. Results : the dot are static, but theirs intensities change (not always decreasing). Sometimes I can see big ugly glob appearing and disappearing the next 2000 iterations.

I wonder how the images used to train affect this evolution, for example if there's some bad images that produces the globs...

Maybe I should upload a GIF

NVIDIA turned out to be greedy bastards

I used to think like that. I don't have a lot of information about nvidia but it seems like a good company, providing good cheap hardware. I can understand that FP16 is useless for the people targeted by 10xx GPU. If you really need FP16 you'll pay the price so that nvidia engineers develop the kind of features you'll need in a few years. Or you can wait for the feature to be included in the next generation gamer GPU :) .

ttoinou avatar Aug 10 '16 15:08 ttoinou

So I am halfway through the third epoch. The image is looking pretty good, but I am still getting a pattern of dots. karya-candy-512_2_37000

Here you can see a closeup: dots

And like ttoinou mentioned I am getting these blobs every now and then. They don't seem to be going away with more training. blob

I suppose I just need to wait for a code update?

Heartsie avatar Aug 11 '16 12:08 Heartsie

@Heartsie The result was so good. I have trained this style before, but the output was terrible, I used the default params and the image size was 512. What params used in your training? screenshot

The dots in your result was not obvious, one of the way to improve it is increase the style image size, but the style image must be changed at the same time. and the blobs can be moved by selected the model which's blobs in the corner.

logic1988 avatar Aug 11 '16 12:08 logic1988

@Heartsie what parameter set did you use ?

cryptexis avatar Aug 11 '16 13:08 cryptexis

I went through 2 and a half epochs of the full coco dataset. 512 image size, 512 style size. Nothing too crazy. Here is the style image: candy-512 Here is the model if you want to play with it. Just rename it to .model

candy-512_2_49000.zip

I stopped it early, but I was training for like 30 hours on my 1070. Its a slow process. I might resume the training later and see if the dots get any better.

Heartsie avatar Aug 11 '16 14:08 Heartsie

@logic1988 what do you mean by "and the blobs can be moved by selected the model which's blobs in the corner."?

Heartsie avatar Aug 11 '16 14:08 Heartsie

@Heartsie hi, you write "512 image size, 512 style size" means set parameter image_size to 512 and style_size to 512, is that right? i can find the image_size in train.py but can not find any param like style size. does "512 style size" means use the style image file with size 512?

xpeng avatar Aug 11 '16 16:08 xpeng

Image size is style size. Just set image_size to 512 and the training images will be automatically adjusted.

6o6o avatar Aug 11 '16 16:08 6o6o

@6o6o got it, thanks

xpeng avatar Aug 11 '16 17:08 xpeng

@Heartsie did I get it correct that you left other parameters untouched ? like lambda_style, lambda_feat and so on...

cryptexis avatar Aug 11 '16 18:08 cryptexis

untouched! Although I do want to experiment with them.

On Thu, Aug 11, 2016 at 2:25 PM, Vahe Hakobyan [email protected] wrote:

@Heartsie https://github.com/Heartsie did I get it correct that you left other parameters untouched ? like lambda_style, lambda_feat and so on...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yusuketomoto/chainer-fast-neuralstyle/issues/33#issuecomment-239247570, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqpIjGRRVMinRoIxMUZoORyMH-8T_ZZks5qe2kJgaJpZM4JgEw9 .

-=Robin Graham Postrgram.com

Heartsie avatar Aug 11 '16 20:08 Heartsie

I just looked at the coco dataset images, and most have a dimension of 512 by smaller than 512. Do you think that could be causing issues with the noise? Or are smaller images than what is being trained OK?

3DTOPO avatar Aug 11 '16 21:08 3DTOPO

Could be... Is there any code in there to scale up everything to at least 512/512? That might be a good idea

Heartsie avatar Aug 11 '16 21:08 Heartsie

I've already implemented scaling up if image size is less than specified value. https://github.com/yusuketomoto/chainer-fast-neuralstyle/blob/master/train.py#L12-L19

I think that dots are likely caused by eigher of:

  • initialization weights of network layers
  • total variation
  • optimizer(?)

yusuketomoto avatar Aug 11 '16 22:08 yusuketomoto

Do you think that could be causing issues with the noise?

I've deleted images that were too small and I have noise.

initialization weights of network layers

Seems plausible from the images I get of the evolution of the trained model.

ttoinou avatar Aug 11 '16 22:08 ttoinou

The images are always upscaled so that the smallest side is image_size. The problem is that most of the images in the dataset are 640x480, and 480 -> 512 causes slight blurring, but it's ok i guess, not a big deal. The problem would arise if you want to go higher. On 8GB GPU you could technically go up to 720px and that's nearly 2x upscaling - not a good thing. The solution would be to redownload every image from flickr in its original size, at the same time crop and scale to desired dimensions. This would also exclude resizing from training, saving you 0.05 sec from an image and that's more than an hour per epoch, but still probably an overkill for most cases.

6o6o avatar Aug 11 '16 22:08 6o6o

@yusuketomoto do you think any of these techniques would result in any benefit?

  • using deeper networks for feature extraction, like VGG19 or GoogLeNet
  • using higher / more layers when training (conv3_3, conv4_3) to pick up more complex features
  • using LBFGS if that's possible

Thanks

6o6o avatar Aug 11 '16 22:08 6o6o