OOTDiffusion
OOTDiffusion copied to clipboard
Request help with training
Recently, I reproduced the OOTD training code and trained it. I would like to ask how low the loss is considered normal. My loss was 1.2 at the beginning, and it has been hovering around 0.2 after more than 8,000 training steps. Is this normal?
The following pictures are the verification results of 2000 steps. It seems that it cannot learn the correct texture and style of clothes. I would like to ask the author, is this normal? Hope to get a reply, thanks!
After 7000 steps of training, it seems to be working. Here are the latest results:
nice results! would you mind sharing the training code?
I also tried to write the training code, but when I run the training code with A100 40G, I get CUDA oom. What is the GPU memory you have or did you use any other alternative to reduce the memory?
Thanks.
After 14000 steps of training, i got those results:
I also tried to write the training code, but when I run the training code with A100 40G, I get CUDA oom. What is the GPU memory you have or did you use any other alternative to reduce the memory?
Thanks.
It takes ~45GB for batch size of 1 if training in fp16. Instead of training both Unets in one go, you can try one by one (just in case of fine tuning).
Nice results @neuralchen, did you continue training after 14k steps as well? Are you also training it in fp16 only?
@elenakovacic you need to config accelerate and enable deepspeed.
Oh, I see, I've never used deepspeed before, let me check. Can we resume training in fp16 using deepspeed? I mean if we load the model into fp16, accelerate will not work, can we do it using deepspeed?
Models are trained in fp16
huggingface accelerate deepspeed allow to use only one model with deepspeed. (i think we have to two model , 1) garment unet, 2) vton unet ) How can i use with deepspeed in accelerate..? Are you make combined model garmnet unet with vton unet?
Models are trained in fp16
Hi, when training in fp16, have you encounterd the issue "Attempting to unscale FP16 gradients."?
Models are trained in fp16
Hi, when training in fp16, have you encounterd the issue "Attempting to unscale FP16 gradients."?
Facing same issue
Models are trained in fp16
Hi, when training in fp16, have you encounterd the issue "Attempting to unscale FP16 gradients."?
Facing same issue
I solved this problem by:
[1] all the model are created by float32 data type:
vae = AutoencoderKL.from_pretrained( VAE_PATH, subfolder="vae" ) unet_garm = UNetGarm2DConditionModel.from_pretrained( UNET_PATH, subfolder="unet_garm_train", use_safetensors=True ) unet_vton = UNetVton2DConditionModel.from_pretrained( UNET_PATH, subfolder="unet_vton_train", use_safetensors=True )
[2] If you use accelarate.prepare(unet_garm, unet_vton); do not manually move them to GPU like unet_garm.to(accelerator.device, dtype=weight_dtype)
This is not solution, this will train it in float32 then instead of float16
This is not solution, this will train it in float32 then instead of float16
I don't know, but the official code in diffusers for training controlnet using the same strategy, you can have a look.
The author must have trained more than 30,000 steps.
The following is the result of my 30,000 steps:
@neuralchen Good progress.
Do you think your results might be different because of using different training hyperparameters or data augmentation techniques?
The author must have trained more than 30,000 steps. The following is the result of my 30,000 steps:
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
hello, @neuralchen i reproduced the OOTD training code and trained it, but i found my result is overfit with training in HDData. The results of the training set are good but the results of the test set are bad. Maybe two unets training with a 10k training set does seem to overfit? Have you ever encountered this problem?
How much epochs did you train it for @Aaron2117 ? Author's results are also bad on some images somethimes, but if yours are bad on almost all test images, then it must be some issue
Can anyone share the training code?
@Aaron2117 The results shown above are all from the test set. Our team believes that the training hyperparameters of ckpt released by the author are very different from those in the paper.
@elenakovacic 300 epochs. i run inference on train set, the results is ok, but on test set is bad
@neuralchen i train 300 epochs on VITON-HD data set. the image size is 512 x 384, and the loss is about 0.03. can you tell me what your final loss is at the end?
@Aaron2117 300 epochs sounds too many. We only trained for 36000 steps (~42 epochs)
@Aaron2117 The results shown above are all from the test set. Our team believes that the training hyperparameters of ckpt released by the author are very different from those in the paper.
@neuralchen Could you tell us the reason? The training hyperparameters are completely same as those in our paper. Our model was trained for 36000 steps with the batch-size of 16 at 1024*768.
@neuralchen Could you tell us the reason? The training hyperparameters are completely same as those in our paper. Our model was trained for 36000 steps with the batch-size of 16 at 1024*768.
Hey, how did you able to use batch-size of 16 at 1024*768 on single A100 80GB? It's only taking batch size of 2 with my script.
@Aaron2117 @neuralchen How much batch size you're using?
In VITON-HD dataset, I also faced the problem that the results on training data are good but no good on tested data. Whether the open source model provided by the author is trained with the Test dataset?
@Aaron2117 The results shown above are all from the test set. Our team believes that the training hyperparameters of ckpt released by the author are very different from those in the paper.
In VITON-HD dataset, I also faced the problem that the results on training data are good but no good on tested data. Whether the open source model provided by the author is trained with the Test dataset?
@Aaron2117 The results shown above are all from the test set. Our team believes that the training hyperparameters of ckpt released by the author are very different from those in the paper.
@appleyang123 Of course not. Don't you think it is ridiculous to train the model on the test data? This is a very severe accusation. The 30,000 step results of @neuralchen on test data look not bad already, though still worse than our 36,000 step checkpoints. And how to use an overfitting model for customized images beyond the dataset? Did you even see their results or try our demo? And did you carefully check the correctness of your code or training process? Your question is very impolite and unprofessional. We will release the training scripts later, or you can refer to others' implementation before that. Thank you.
Hi, I have checked the results. I guess that the public model is trained on resolution=768 * 1024, so infer the tested data in resolution=768 * 1024, it has a good result while it has inferior results with resolution=384 * 512
768 * 1024:
384 * 512:
In my own experiment, I have the similar results. I have trained with resolution=384 * 512 in 14000 steps. The results in tested data with resolution=384 * 512 are much better than resolution=768 * 1024.
384 * 512:
768 * 1024: