fast-stable-diffusion
fast-stable-diffusion copied to clipboard
Recommended settings for subject training
I know this is a topic that has been discussed several times but I have found contradictory information. What are the recommended settings to train the model with images of a subject from Colab?
- What is the optimal number of images? How many of these should be from the face only?
- How many training steps should I start with, on both the unit and text encoder? And with what learning rates?
- Which model version works best for these cases? 1.5? 2.1-512? 2.1-768?
- If the session name is "mikeasdasd", what should be the name of the instance images? "mikeasdasd (1)"?
- Should I edit the "instance_prompt" text or leave it as is? If I have to edit it should I put for example "a photo of a mikeasdasd young man"?
- I don't have to use captions or concept images, right?
Thank you so much.
if a person, try below. I've had great success with this but i am doing img2img to get the best results. -minimum 15 pics, lighting and angle are important so get a variation (mine have always been just the face). -unet training steps x200 so 20 pics 4000 steps. I always use 350 for the text encoder steps. -Version 2.1-512 has been the best so far -correct on the instance images ie naming "mikeasdad, mikeasdad(1), mikeasdad(2) -leave as is, your prompts will get you what you want. One tip, once running automatic1111, I have better results if I increase the trained model. Meaning it would look like "a photo of (mikeasdad: 1.2) etc. Just highlight mikeasdad and hold control then press the up arrow. -don't have to use. edit: use 2e-6 for both learning rates.
Good luck!
Could you explain this a bit further:
""a photo of (mikeasdad: 1.2) etc. Just highlight mikeasdad and hold control then press the up arrow."
I was able to do it with the Ctrl+Up, but I'm not entirely sure what this means or what its doing? Any info link or further explanation would help.
Also thanks for your other tips, they helped improve my results quite a bit!
I've always found it puts more emphasis on the trained model, with more detail I guess. Other prompts will do that too but I only use a few prompts and use more negative prompts.
if a person, try below. I've had great success with this but i am doing img2img to get the best results. -minimum 15 pics, lighting and angle are important so get a variation (mine have always been just the face). -unet training steps x200 so 20 pics 4000 steps. I always use 350 for the text encoder steps. -Version 2.1-512 has been the best so far -correct on the instance images ie naming "mikeasdad, mikeasdad(1), mikeasdad(2) -leave as is, your prompts will get you what you want. One tip, once running automatic1111, I have better results if I increase the trained model. Meaning it would look like "a photo of (mikeasdad: 1.2) etc. Just highlight mikeasdad and hold control then press the up arrow. -don't have to use. edit: use 2e-6 for both learning rates.
Good luck!
To add to this, I've been using the same formula except one huge improvement I found is 450 text encode at 1e-6. Less mutations, more likeness, and better styling. unet remains img*200 2e-6
To add to this, I've been using the same formula except one huge improvement I found is 450 text encode at 1e-6. Less mutations, more likeness, and better styling. unet remains img*200 2e-6
For 1.5 or 2.1?
To add to this, I've been using the same formula except one huge improvement I found is 450 text encode at 1e-6. Less mutations, more likeness, and better styling. unet remains img*200 2e-6
For 1.5 or 2.1?
2.1
Thank you very much for the answers, they helped me a lot.
Do you think this configuration gives better results than the one discussed in #1127 ? (IMAGES: 10 / UNIT: 600-800 2e-5 / TEXT: 350 1e-6). In any case, why are the configuration values so different?
And about the resolution, does the training work just as well for 2.1-512 as it does for 2.1-678? If so, do I need to modify any of the settings?
Thank you so much.