img2img-turbo Training time of Unpaired Day2night

Thanks for your impressive work! I'm curious about the training time required for the Unpaired Day2Night model on the BDD100K.

It took me around 1 day on a single A100 GPU for only 1k steps. This training time seems pretty long, and I have a feeling there might be some problems causing it to take so much time. Could you help me figure out what's going on?

Here are some of the training parameters: --learning_rate="1e-5" --max_train_steps=25000
--train_batch_size=1 --gradient_accumulation_steps=1
--enable_xformers_memory_efficient_attention --validation_steps 250
--lambda_gan 0.5 --lambda_idt 1 --lambda_cycle 1

Apr 25 '24 13:04 wenyi-li

I'm also training BDD100K Day2Night. It takes about 16 hours on four 4090 GPUs to complete 25,000 training steps.

Jun 29 '24 02:06 ShenZheng2000

@ShenZheng2000 did you train with 512x512? Does it fit into the 24GB of the 4090?

Jul 20 '24 14:07 tfriedel

@tfriedel

No, training with 512x512 resolution doesn't fit into 24GB of memory; it may require 48GB or more.

However, I experimented with resize_286_randomcrop_256x256_hflip and found that:

it fits within 24GB
it takes around 16 hours to complete
the reproduced results are promising, similar to the pretrained results.

Jul 20 '24 17:07 ShenZheng2000

@ShenZheng2000 Did you get good results, even with this bug: https://github.com/GaParmar/img2img-turbo/issues/56 present? I applied the PR for this with a custom dataset but it stopped improving after about 2000 steps. I think the bugfix may either not be complete, or I shouldn't have used grad accumulation. I have 4 GPUs and according to the paper they used batch size 8, so I thought with grad accum steps = 2 I'd have the same effect. Or maybe my dataset is more challenging.

Jul 21 '24 08:07 tfriedel

img2img-turbo img2img-turbo copied to clipboard

Training time of Unpaired Day2night

img2img-turbo
img2img-turbo copied to clipboard