IDM-VTON icon indicating copy to clipboard operation
IDM-VTON copied to clipboard

Training questions

Open nom opened this issue 9 months ago • 11 comments

Hey, great work! Quick question on training.

I was wondering how you're fitting two SDXL UNets (garment UNet and tryon UNet) on a single A800 with batch size 24/4=6 (assuming 4xA800 in total). I see you're using FP16 models, but are you doing any optimizations to bring memory down, like precomputing embeddings / features, 8bit adam or gradient accumulation? I'm trying to reproduce training, but can only fit 3 samples at 1024x768 resolution on 80GB VRAM during training and a single step takes ~1.3 seconds on a H100. I'm already doing the above tricks (8bit adam, precomputing VAE embeddings, frozen garment unet).

Also curious about training speed if you can share. Thanks!

nom avatar May 02 '24 00:05 nom

Hello, we used gradient checkpointing and 8 bit adam for training and fit batch size 6 to single A100 GPU. We didn't use precomputing latents and embeddings or gradient accumulation but you can use them for reducing memory cost. Training time was around 1~2day on 4xA100 GPU for 63k iterations.

yisol avatar May 02 '24 16:05 yisol

Thanks @yisol. Are you perhaps not doing EMA?

Also if you could share a work-in-progress rough train script here, that'd be really helpful - just to get a better understanding of the differences with mine, doesn't have to be a working script.

nom avatar May 02 '24 16:05 nom

Did you use noise_offset or snr_gamma (=5) during training?

ifeherva avatar May 05 '24 18:05 ifeherva

Thanks @yisol. Are you perhaps not doing EMA?

Also if you could share a work-in-progress rough train script here, that'd be really helpful - just to get a better understanding of the differences with mine, doesn't have to be a working script.

@yisol It would really helpful indeed

cardosofelipe avatar May 16 '24 14:05 cardosofelipe

Hey @nom I am trying to replicate the training, it would be great if you share a glimpse of your script or an idea also will work.

Anustup900 avatar May 17 '24 05:05 Anustup900

@nom can you share finetune code for me ?

awzhgw avatar May 17 '24 10:05 awzhgw

@nom can you share training or finetune code for me ?

jasonaidm avatar May 23 '24 06:05 jasonaidm

@nom can you share the fine-tune code?

thuc248997 avatar May 24 '24 03:05 thuc248997

@nom can you share the fine-tune code?

awzhgw avatar Jun 01 '24 11:06 awzhgw

@nom can you share the fine-tune code?

ttjygbtj22 avatar Jun 09 '24 12:06 ttjygbtj22

I made an unofficial training code here. Still testing it. Please try if you like: https://github.com/nftblackmagic/IDM-VTON-training

nftblackmagic avatar Jun 17 '24 15:06 nftblackmagic