PaintsChainer icon indicating copy to clipboard operation
PaintsChainer copied to clipboard

How can I train models by my own datasets?

Open mike199212 opened this issue 7 years ago • 23 comments

After I have received your answer to Issue #22, I have been trying to train up models by my own datasets, but I have not yet got any good result. So I want to ask you for more detailed explanation for training. My current situation is roughly that: I could have run train_128.py and train_x2.py, and I have created my models using small datasets as the first trial. But when installed at /cgi-bin/paint_x2_unet/models/ as unet_128_standard and unet_512_standard, they generate green images from every line drawings. The smaller (128, 128) output from unet_128_standard is already bad. It may due to that the dataset is too small or training is not enough, but I suspect that I don’t understand the way to train models correctly. I used only 30 pairs of (color, line) images, and trained 200 epochs by train_128.py, but even over-learning did not occur: epoch cnn/loss cnn/loss_rec cnn/loss_adv cnn/loss_tag cnn/loss_l dis/loss
10 160.816 157.485 1.58426 1.74656 4.05362
200 168 157.124 9.09956 1.77699 0.678594

My environments: Amazon EC2 p2.xlarge Python 3.5.2 “cv2” module ver 3.1.0 Chainer 1.20.0.1 PaintsChainer master at Feb 1.

What I have tried are as follows:

Downloading your pre-trained models and run server.py to check that I have successfully installed requirements.

Fixing small bugs (at least for my eyes they seem to be bugs): At line 132 of lnet.py, the method LNET.calc() returns a tuple (d0, e), but the variable e is not declared in this scope, so it leads to an error. The method LNET.calc() is called at line 138 and line 139 of train_128.py (where ganUpdater.loss_cnn() is defined), and the latter element of the return value of LNET.calc() will not be used, so I changed the return value of LNET.calc() to the tuple (d0, None). (Though it is an ad hoc way.) Secondly, in main() of train_128.py (and also of train_x2.py), the variable out_dir is not initialized but used, so I have add a code out_dir = args.out.

Reading Issue 13. I prepared pairs of 3-channel png images (color image, line image) of size (128, 128) and put them in paint_x2_unet/color and paint_x2_unet/line. I resized them to (512, 512) and put them in paint_x2_unet/colorx2 and paint_x2_unet/linex2. I wrote their names in paint_x2_unet/dat/images_clor_train.dat. As for the image formats, I also tried jpeg, but the bad green result did not change.

Additionally, I have commented out the 66th line of train_128.py. serializers.load_npz("models/liner_f", l)

I first run train_128.py to get model_final. Then I edited train_x2.py to load it as the network cnn_128, and run train_x2.py. I put the model_final generated by train_128.py to paint_x2_unet/models as unet_128_standard and what generated by train_x2.py to the same directory as unet_512_standard.

mike199212 avatar Feb 06 '17 12:02 mike199212

It seems mistake or bug in opencv channel when read and write BGR or RGB. we patched server executing part to check opencv version , but if training code used wrong channel, it can be occur...

taizan avatar Feb 06 '17 12:02 taizan

Thank you for your response and information. I will proceed with paying attention to it.

mike199212 avatar Feb 07 '17 01:02 mike199212

I encountered the same problem... does it mean that I cannot use RBG image format to train?

czxrrr avatar Feb 08 '17 12:02 czxrrr

@mike199212 Hello, may I ask if you successfully trained your model and avoided the green picture?

czxrrr avatar Feb 08 '17 13:02 czxrrr

@czxrrr Green pictures are probably caused by the mismatch of RGB channel.

find this line and try which one you should adapt to

https://github.com/pfnet/PaintsChainer/commit/8bf41ed7c0083211d6bcaa4f9cc03258cc204a94#diff-a0143a1f10c063528854540f5f51196e

abbychau avatar Feb 08 '17 15:02 abbychau

@abbychau I tried RGB and BGR... But it always output a green picture no matter what picture I feed to it. I am curious about your original dataset Were you using 4-channel png, or 3-channel jpg or something else? Thanks for your patient help!

czxrrr avatar Feb 09 '17 03:02 czxrrr

Is your local version of paint is work well? If the problem was output of re-trained model only, it canbe failure of learning with GAN. Please make sure your training work well without GAN first.

taizan avatar Feb 09 '17 05:02 taizan

@taizan Do you mean using the StandardUpdate() ?

czxrrr avatar Feb 09 '17 08:02 czxrrr

no, comment out
loss = loss_rec + loss_adv + loss_l and use only loss = loss_rec you will get sepia images if it works well.

there are 2 suspicious problem

  • Input file types are wrapped by OpenCV but its standard channel is depends on environment.
  • training with adversarial network learning will sometimes collapse and output will crush.

I couldn't judge which problem is yours, now.

also I strongly recommend to train step by step, so check the result of model_of_128 1st.

taizan avatar Feb 09 '17 16:02 taizan

I think the reason should be the first one, about the channels I found that the first train will make all the pixels become 0 except for the Green channel, which means only green channel has non-zero value... and no matter BGR or RGB I use , it turns green all the time. (I am using Python3 with open cv 3.2.0)

czxrrr avatar Feb 13 '17 04:02 czxrrr

Hi I got same problem in new environment setup. The problem is miss much of cudnn version and GPU. Pascal or some new GPU dose not support old cudnn and it couse all green output.

taizan avatar Feb 15 '17 05:02 taizan

My cuda version is 7.5 Does that mean cuda 8.0 or higher should be installed?

czxrrr avatar Feb 15 '17 09:02 czxrrr

It could be a reason of trouble and it is depends on your GPU.

taizan avatar Feb 15 '17 14:02 taizan

@taizan Thank you. And Currently I am training my model, I found that after training. The output of a line graph is like this (YUV) [ 3.9902513 4.66042328 5.40228987] [ 3.48818111 3.29726362 4.91220808] [ 2.51852775 2.57776284 3.40684319]]] So, the YUV will be converted into RGB like this [ 0 123 0] [ 0 124 0] [ 0 124 0]]] and all the pixels of YUV are near to 0, and the Green channel of RGB will be near to 128 and all others will be 0. That's the reason why green images haunt all the time

Do you have any suggestion about why the output YUV is very near to 0. Does that mean the number of my training images is not enough? I only fed 1000 images to the model

czxrrr avatar Feb 23 '17 04:02 czxrrr

please make sure you are using suitable version of cudnn. and make sure opencv input & output is correctly done.

taizan avatar Feb 23 '17 11:02 taizan

I apologize for late reply. After I re-installed OpenCV by the same way described in Installation guide, that is, conda install -c menpo opencv3, then the problem was solved in my environments.

mike199212 avatar Mar 03 '17 12:03 mike199212

OK thats good

taizan avatar Mar 04 '17 02:03 taizan

@czxrrr Hi, do you solve your problem? I meet the same problem with you. I currently use windows 10 64bit, python 3.5, OpenCV 3.2, GTX 1070, CUDA 8.0, cudnn 5.1 (should be high version enough to fit the pascal GPU). The local server works well with the pre-trained model, but all the output always will be green when i use my own-trained model, even though after i remove the GAN in train_128. The opencv and cuda seems works well when i test them by other commands. I am not a programmer but i hope my description is clear enough. It you have any progress of solving this please let me know. thanks!

shengyu-meng avatar Mar 21 '17 09:03 shengyu-meng

@taizan @czxrrr @liyourk Hi, thanks for your great works and discussions. I meet the same problem which has green results using my own training dataset. and I try change cudnn version 2.0~6.0 and I re-install cupy & chainer during every change cudnn ver. I use GTX 970 or 1080, and try linux & windows both. but all results have green. How can I fix it? Actually, your demo website is updated, so I think you already fix this problem on some new graphic card or cudnn ver or I do not know things. If you have some solutions, please teach me. thank you.

gjghks avatar Dec 26 '17 00:12 gjghks

dose our pretrained model work in your environment ? If so, the input data format readed by opencv can be one of problem. or there are maybe some problem in your training process.

taizan avatar Dec 26 '17 05:12 taizan

You can use some standard deep learning tools as your beginning: pix2pix: https://github.com/phillipi/pix2pix cycleGAN: https://github.com/junyanz/CycleGAN Chainer version: pix2pix: https://github.com/pfnet-research/chainer-pix2pix cycleGAN: https://github.com/Aixile/chainer-cyclegan

lllyasviel avatar Dec 26 '17 08:12 lllyasviel

@taizan Thank you taizan! for reply! your pre-trained model works fine! below image is my green result using own dataset. bxl7gn8n3m1eay3jjjcn631l75gf57wx_0

as you said, is this not a problem about GPU or cudnn version? below images are original and line images (128x128) for training. I try jpg and png file both, but results are same green results. 1 1

and below image is RGB to YUV image. yuv

I use opencv 3.3.0. What do you think about my situation? Please I hope to help.

gjghks avatar Dec 26 '17 08:12 gjghks

@lllyasviel Thank you for reply! Fix the problem first and I will check the sites you said. I interested in deep learning especially GAN. Thanks for your kindness.

gjghks avatar Dec 26 '17 09:12 gjghks