img2img-turbo
img2img-turbo copied to clipboard
Training time of Unpaired Day2night
Thanks for your impressive work! I'm curious about the training time required for the Unpaired Day2Night model on the BDD100K.
It took me around 1 day on a single A100 GPU for only 1k steps. This training time seems pretty long, and I have a feeling there might be some problems causing it to take so much time. Could you help me figure out what's going on?
Here are some of the training parameters:
--learning_rate="1e-5" --max_train_steps=25000
--train_batch_size=1 --gradient_accumulation_steps=1
--enable_xformers_memory_efficient_attention --validation_steps 250
--lambda_gan 0.5 --lambda_idt 1 --lambda_cycle 1
I'm also training BDD100K Day2Night. It takes about 16 hours on four 4090 GPUs to complete 25,000 training steps.
@ShenZheng2000 did you train with 512x512? Does it fit into the 24GB of the 4090?
@tfriedel
No, training with 512x512 resolution doesn't fit into 24GB of memory; it may require 48GB or more.
However, I experimented with resize_286_randomcrop_256x256_hflip
and found that:
- it fits within 24GB
- it takes around 16 hours to complete
- the reproduced results are promising, similar to the pretrained results.
@ShenZheng2000 Did you get good results, even with this bug: https://github.com/GaParmar/img2img-turbo/issues/56 present? I applied the PR for this with a custom dataset but it stopped improving after about 2000 steps. I think the bugfix may either not be complete, or I shouldn't have used grad accumulation. I have 4 GPUs and according to the paper they used batch size 8, so I thought with grad accum steps = 2 I'd have the same effect. Or maybe my dataset is more challenging.