OOTDiffusion Request help with training

Recently, I reproduced the OOTD training code and trained it. I would like to ask how low the loss is considered normal. My loss was 1.2 at the beginning, and it has been hovering around 0.2 after more than 8,000 training steps. Is this normal?

The following pictures are the verification results of 2000 steps. It seems that it cannot learn the correct texture and style of clothes. I would like to ask the author, is this normal? Hope to get a reply, thanks!

08432_00_step_2000 08297_00_step_2000

06491_00_step_2000 04339_00_step_2000 06753_00_step_2000

02204_00_step_2000 02682_00_step_2000 04240_00_step_2000

00683_00_step_2000 01486_00_step_2000 02023_00_step_2000 05498_00_step_2000

Mar 23 '24 16:03 neuralchen

After 7000 steps of training, it seems to be working. Here are the latest results:

00683_00_step_7000 01486_00_step_7000 04240_00_step_7000 12873_00_step_7000 06753_00_step_7000 08432_00_step_7000 09996_00_step_7000 12384_00_step_7000

Mar 24 '24 08:03 neuralchen

nice results! would you mind sharing the training code?

Mar 24 '24 11:03 filippocastelli

I also tried to write the training code, but when I run the training code with A100 40G, I get CUDA oom. What is the GPU memory you have or did you use any other alternative to reduce the memory?

Thanks.

Mar 25 '24 00:03 edith-wq

After 14000 steps of training, i got those results: 00683_00_step_14000 01486_00_step_14000 04240_00_step_14000 06753_00_step_14000 08432_00_step_14000 01711_00_step_14000 01812_00_step_14000 01745_00_step_14000 01748_00_step_14000 01755_00_step_14000 01804_00_step_14000

Mar 25 '24 13:03 neuralchen

I also tried to write the training code, but when I run the training code with A100 40G, I get CUDA oom. What is the GPU memory you have or did you use any other alternative to reduce the memory?

Thanks.

It takes ~45GB for batch size of 1 if training in fp16. Instead of training both Unets in one go, you can try one by one (just in case of fine tuning).

Mar 27 '24 13:03 elenakovacic

Nice results @neuralchen, did you continue training after 14k steps as well? Are you also training it in fp16 only?

Mar 27 '24 13:03 elenakovacic

@elenakovacic you need to config accelerate and enable deepspeed.

Mar 27 '24 14:03 neuralchen

Oh, I see, I've never used deepspeed before, let me check. Can we resume training in fp16 using deepspeed? I mean if we load the model into fp16, accelerate will not work, can we do it using deepspeed?

Mar 27 '24 14:03 elenakovacic

Models are trained in fp16

Mar 27 '24 14:03 neuralchen

huggingface accelerate deepspeed allow to use only one model with deepspeed. (i think we have to two model , 1) garment unet, 2) vton unet ) How can i use with deepspeed in accelerate..? Are you make combined model garmnet unet with vton unet?

Mar 27 '24 15:03 failbetter77

Models are trained in fp16

Hi, when training in fp16, have you encounterd the issue "Attempting to unscale FP16 gradients."?

Apr 01 '24 11:04 chenbinghui1

Models are trained in fp16

Hi, when training in fp16, have you encounterd the issue "Attempting to unscale FP16 gradients."?

Facing same issue

Apr 01 '24 11:04 elenakovacic

Models are trained in fp16

Hi, when training in fp16, have you encounterd the issue "Attempting to unscale FP16 gradients."?

Facing same issue

I solved this problem by： [1] all the model are created by float32 data type: vae = AutoencoderKL.from_pretrained( VAE_PATH, subfolder="vae" ) unet_garm = UNetGarm2DConditionModel.from_pretrained( UNET_PATH, subfolder="unet_garm_train", use_safetensors=True ) unet_vton = UNetVton2DConditionModel.from_pretrained( UNET_PATH, subfolder="unet_vton_train", use_safetensors=True )

[2] If you use accelarate.prepare(unet_garm, unet_vton); do not manually move them to GPU like unet_garm.to(accelerator.device, dtype=weight_dtype)

Apr 01 '24 13:04 chenbinghui1

This is not solution, this will train it in float32 then instead of float16

Apr 01 '24 13:04 elenakovacic

This is not solution, this will train it in float32 then instead of float16

I don't know, but the official code in diffusers for training controlnet using the same strategy, you can have a look.

Apr 01 '24 13:04 chenbinghui1

The author must have trained more than 30,000 steps. The following is the result of my 30,000 steps: 00035_00_step_30000 00055_00_step_30000 00057_00_step_30000 00064_00_step_30000 00067_00_step_30000 00069_00_step_30000 00071_00_step_30000 00074_00_step_30000 00006_00_step_30000 00017_00_step_30000 00034_00_step_30000

Apr 01 '24 14:04 neuralchen

@neuralchen Good progress.

Do you think your results might be different because of using different training hyperparameters or data augmentation techniques?

Apr 01 '24 19:04 i-amgeek

The author must have trained more than 30,000 steps. The following is the result of my 30,000 steps:

hello， @neuralchen i reproduced the OOTD training code and trained it, but i found my result is overfit with training in HDData. The results of the training set are good but the results of the test set are bad. Maybe two unets training with a 10k training set does seem to overfit? Have you ever encountered this problem?

Apr 06 '24 14:04 Aaron2117

How much epochs did you train it for @Aaron2117 ? Author's results are also bad on some images somethimes, but if yours are bad on almost all test images, then it must be some issue

Apr 06 '24 22:04 elenakovacic

Can anyone share the training code?

Apr 07 '24 11:04 appleyang123

@Aaron2117 The results shown above are all from the test set. Our team believes that the training hyperparameters of ckpt released by the author are very different from those in the paper.

Apr 07 '24 11:04 neuralchen

@elenakovacic 300 epochs. i run inference on train set, the results is ok, but on test set is bad

Apr 09 '24 02:04 Aaron2117

@neuralchen i train 300 epochs on VITON-HD data set. the image size is 512 x 384, and the loss is about 0.03. can you tell me what your final loss is at the end?

Apr 09 '24 02:04 Aaron2117

@Aaron2117 300 epochs sounds too many. We only trained for 36000 steps (~42 epochs)

Apr 09 '24 03:04 levihsu

@Aaron2117 The results shown above are all from the test set. Our team believes that the training hyperparameters of ckpt released by the author are very different from those in the paper.

@neuralchen Could you tell us the reason? The training hyperparameters are completely same as those in our paper. Our model was trained for 36000 steps with the batch-size of 16 at 1024*768.

Apr 09 '24 03:04 levihsu

@neuralchen Could you tell us the reason? The training hyperparameters are completely same as those in our paper. Our model was trained for 36000 steps with the batch-size of 16 at 1024*768.

Hey, how did you able to use batch-size of 16 at 1024*768 on single A100 80GB? It's only taking batch size of 2 with my script.

@Aaron2117 @neuralchen How much batch size you're using?

Apr 09 '24 18:04 elenakovacic

In VITON-HD dataset, I also faced the problem that the results on training data are good but no good on tested data. Whether the open source model provided by the author is trained with the Test dataset?

@Aaron2117 The results shown above are all from the test set. Our team believes that the training hyperparameters of ckpt released by the author are very different from those in the paper.

Apr 13 '24 09:04 appleyang123

In VITON-HD dataset, I also faced the problem that the results on training data are good but no good on tested data. Whether the open source model provided by the author is trained with the Test dataset?

@Aaron2117 The results shown above are all from the test set. Our team believes that the training hyperparameters of ckpt released by the author are very different from those in the paper.

@appleyang123 Of course not. Don't you think it is ridiculous to train the model on the test data? This is a very severe accusation. The 30,000 step results of @neuralchen on test data look not bad already, though still worse than our 36,000 step checkpoints. And how to use an overfitting model for customized images beyond the dataset? Did you even see their results or try our demo? And did you carefully check the correctness of your code or training process? Your question is very impolite and unprofessional. We will release the training scripts later, or you can refer to others' implementation before that. Thank you.

Apr 13 '24 16:04 levihsu

Hi, I have checked the results. I guess that the public model is trained on resolution=768 * 1024, so infer the tested data in resolution=768 * 1024, it has a good result while it has inferior results with resolution=384 * 512 768 * 1024: 384 * 512:

In my own experiment, I have the similar results. I have trained with resolution=384 * 512 in 14000 steps. The results in tested data with resolution=384 * 512 are much better than resolution=768 * 1024. 384 * 512: 768 * 1024:

Apr 15 '24 02:04 appleyang123

OOTDiffusion OOTDiffusion copied to clipboard

Request help with training

OOTDiffusion
OOTDiffusion copied to clipboard