ColossalAI
ColossalAI copied to clipboard
[BUG]: ColossalAI/examples/images/diffusion/—The results of the sampling were poor
🐛 Describe the bug
The results of the sampling were poor
plms sampling:
dpm sampling:
Train 3000 epochs using the TEYVAT dataset
Environment
dpm or plms sampling
python txt2img.py --prompt "Teyvat, Name:Keqing, Element:Electro, Weapon:Sword, Region:Liyue, Model type:Medium Female, Description:an anime character wearing a purple dress and cat ears."
--dpm
--outdir ./output/fashion
--ckpt /home/project/ColossalAI/examples/images/diffusion/logfiles/2023-02-13T01-50-34_train_colossalai_teyvat/diff_tb/version_0/checkpoints/epoch=2917-step=11672.ckpt
--config /home/project/ColossalAI/examples/images/diffusion/logfiles/2023-02-13T01-50-34_train_colossalai_teyvat/configs/2023-02-13T01-50-34-project.yaml
--n_samples 2
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
🐛 Describe the bug
The results of the sampling were poor
plms sampling:
dpm sampling:
Train 3000 epochs using the TEYVAT dataset
Environment
dpm or plms sampling
python txt2img.py --prompt "Teyvat, Name:Keqing, Element:Electro, Weapon:Sword, Region:Liyue, Model type:Medium Female, Description:an anime character wearing a purple dress and cat ears."
--dpm
--outdir ./output/fashion
--ckpt /home/project/ColossalAI/examples/images/diffusion/logfiles/2023-02-13T01-50-34_train_colossalai_teyvat/diff_tb/version_0/checkpoints/epoch=2917-step=11672.ckpt
--config /home/project/ColossalAI/examples/images/diffusion/logfiles/2023-02-13T01-50-34_train_colossalai_teyvat/configs/2023-02-13T01-50-34-project.yaml
--n_samples 2
I don't know what caused it. I tried many different trainings, PLMS sampling was always bad, DPM was a little better, but still there was noise. Looking forward to your reply! Thank you!
I will try this out myself and get back to you within the next two days.
How many steps have you trained?
@FrankLeeeee
I train 3000 epochs using the TEYVAT dataset(https://huggingface.co/datasets/Fazzie/Teyvat), I just modified the configuration file'max_epochs(train_colossalai_teyvat.yaml).

Based on the last generated ckpt file(epoch=2917-step=11672.ckpt ),steps should be 11672. At this step I saw that the loss was basically not optimized, so I stopped training at epoch 2917.
Is there any update on this? @JThh
Hi @LhaoH @Thomas2419 Diffusion convergence is a challenging task, and Colossal-AI provides a more effective development tool. This issue does not seem to be a bug caused by Colossal-AI. For personalized development issues, this goes beyond the scope of open source community support.
Feel free to contact me via email [email protected] to discuss formal collaboration, and you will receive professional high-priority support to help you get your product development done quickly and at low cost. Thanks.
This issue was closed due to inactivity. Thanks.
@LhaoH Your procedure is basically for fine-tuning the model which won't make meaningful improvement directly on pretrained checkpoints, due to the limited dataset size. Have you run the training procedure using LAION dataset?
@JThh I did not try it on the LAION dataset, but I ran 2 epochs on the custom dataset (14w), totaling 5054 steps, and the generation effect was relatively poor.
prompt:"with slightly puffed shoulder and a flattering u neckline this cozy long sleeved top is easy to wear and effortlessly feminine"
prompt:"sharp vintage frame define stylish sunglass with indelible appeal"

Because I am unable to solve this problem,So I have given up using colosalai. Currently, but I have successfully run the official fine-tuning code on my own dataset. Thank you for keeping this issue in mind.Thank you very much.