encoder4editing icon indicating copy to clipboard operation
encoder4editing copied to clipboard

How much time did it take to train on FFHQ?

Open shionhonda opened this issue 3 years ago • 5 comments

Thanks for this great work!

I'm trying to train e4e from scratch on the face domain (mostly the same as FFHQ, but 512x512 resolution). Now it has been trained for 100k steps and the reconstruction results look fine so far. The problem is, training proceeds very slowly. It is estimated to take more than one week to reach 300k steps on a single Tesla T4 GPU. I keep the size of the validation set to 1000, so the time consumed for evaluation is trivial.

My questions are as follows:

  • How much time did it take to train e4e on FFHQ? (the paper says it was 3 days with P40 GPU for the cars domain, so it's 1-2 weeks for FFHQ 1024x1024?)
  • Do you possibly to have any loss curve when you trained e4e? I'd like to know if we could prune the training in the middle (say, at 200k steps). I confirmed, at least, 100k steps were not enough. Both reconstruction and editing performed poorly.
  • Do you have any idea to reduce the time for training? I kind of suspect we could increase the learning rate to 0.001 or around and make the model converge faster

I know I should experiment with these myself, but since each trial takes a long time, so any suggestion will help. I appreciate your kind reply.

shionhonda avatar Apr 20 '22 08:04 shionhonda

It costs 5 days to train FFHQ-1024x1024 (500k iterations) on 3090.

caopulan avatar Aug 30 '22 09:08 caopulan

@caopulan Thanks for sharing! Do you mean 50,000 iterations, I guess?

shionhonda avatar Aug 30 '22 13:08 shionhonda

Do you have any idea to reduce the time for training? I kind of suspect we could increase the learning rate to 0.001 or around and make the model converge faster

In my experiment learning rate could safely be raised to 0.001.

shionhonda avatar Aug 30 '22 13:08 shionhonda

@caopulan Thanks for sharing! Do you mean 50,000 iterations, I guess?

I'm sorry, I mean 500,000 iterations. I have corrected it.

caopulan avatar Aug 30 '22 13:08 caopulan

And I found distributed training is not effective. Training on 8 cards accelerates only 1.5~2x.

caopulan avatar Aug 30 '22 13:08 caopulan