GANimation
GANimation copied to clipboard
training process too slow
Hello, I use 1080Ti to train 200000 images, every image is 128*128. However, the speed of training is very slow, It takes me almost 4 hours to finish one epoch which is different from what you said in your paper. Could you please tell me why?
I do not remember the time per iteration we had. Check the wait time for a new batch, I strongly believe we had 0, if yours is not zero it is definitely slowing you down. If this was the case, the easy first try is to augment the number of threads. If still is not 0 then what I do is running a profiler across the data loader and identify the bottlenecks. Sorry I do not have more insights.
After the training process finished, I test the model, and the result is below.
I don't know what happened to my dataset.
I use
./bin/FaceLandmarkImg -nomask -simsize 128 -fdir
def isgray(convert_img): im = Image.open(convert_img) if im.mode == 'RGB': chip = cv.imread(convert_img) r, g, b = cv.split(chip) r = r.astype(np.float32) g = g.astype(np.float32) b = b.astype(np.float32) s_w, s_h = r.shape[:2] x = (r + b + g) / 3 r_gray = abs(r-x) g_gray = abs(g-x) b_gray = abs(b-x) r_sum = np.sum(r_gray)/(s_w * s_h) g_sum = np.sum(g_gray)/(s_w * s_h) b_sum = np.sum(b_gray)/(s_w * s_h) gray_degree = (r_sum+g_sum+b_sum)/3 if gray_degree < 10: return True # Gray Image else: return False # color Image
to filter gray images.
I use
threshhold=400 res_ratio = cv.Laplacian(img2, cv.CV_64F).var() res_ratio > threshhold
to filter low-res images
The openface generate images in bmp format, so I use
def bmp2jpg(origin_img): img = Image.open(origin_img) convert_img = origin_img[:-4] + '.jpg' img.save(convert_img) return convert_img
to save bmp images as jpg images.
I prepared 200000 images while 100000 images nameswrite in train_ids.csv 100000 images names write in test_ids.csv and all images are put in imgs.
Could you please tell me what's wrong with my dataset?
Seems more a problem of not being able to load the weights. This is the usual output for random weights.
This is the result after I python test.py Is there any problems? ------------ Options ------------- aus_file: aus_openface.pkl batch_size: 4 checkpoints_dir: ./checkpoints cond_nc: 17 data_dir: None dataset_mode: aus do_saturate_mask: False gpu_ids: [0] image_size: 128 images_folder: imgs input_path: /home/zhutao/N_0000000203_00647.jpg is_train: False load_epoch: 30 model: ganimation n_threads_test: 1 name: experiment_1 output_dir: ./output serial_batches: False test_ids_file: test_ids.csv train_ids_file: train_ids.csv -------------- End ---------------- ./checkpoints/experiment_1 Network generator_wasserstein_gan was created Network discriminator_wasserstein_gan was created loaded net: ./checkpoints/experiment_1/net_epoch_30_id_G.pth Model GANimation was created
"input_path: /home/zhutao/N_0000000203_00647.jpg" This does not look good at all. Are you training with only one image?
No. I mean I use bash launch/run_train.sh to train the model. There are 200000 images in /dataset/imgs
Then I use python test.py --input_path /home/zhutao/N_0000000203_00647.jpg to test the model
And the param data_dir
in run_train.sh
is pointing to your dataset? While training, the code generates an events file that can be visualized using tensorboard. Check there that the images in every batch are the ones in your dataset.
Yes, the param data_dir in run_train.sh is pointing to my dataset. And this is the files after 30 epochs training.
I think I've loaded net: ./checkpoints/experiment_1/net_epoch_30_id_G.pth successfully when testing the model.
And this is the loss during training
@ZHUTAO142857 You should add parameter -aus to do FaceLandmarkImg. The .csv file generated will be different! See data/prepare_au_annotations.py.
Hello, I use 1080Ti to train 200000 images, every image is 128*128. However, the speed of training is very slow, It takes me almost 4 hours to finish one epoch which is different from what you said in your paper. Could you please tell me why? can you share the 200,000 images for me, thanks!
I think you got the wrong email dude, sorry.
On Thu, Feb 4, 2021 at 1:03 AM xinyuxiao [email protected] wrote:
Hello, I use 1080Ti to train 200000 images, every image is 128*128. However, the speed of training is very slow, It takes me almost 4 hours to finish one epoch which is different from what you said in your paper. Could you please tell me why? can you share the 200,000 images for me, thanks!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/albertpumarola/GANimation/issues/62#issuecomment-773113384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQP3M2MT37EQEIWZADNLGF3S5JIGLANCNFSM4GCHV3NA .