stable-diffusion Resume training on custom data

I want to further train stable-diffusion-v1-4 on my custom dataset. I couldn't find any training script in the repo. Can anyone tell me how can this be accomplished? Is there a training script available so I can resume training?

Aug 24 '22 11:08 affanmehmood

https://github.com/nicolai256/Stable-textual-inversion_win

Aug 24 '22 20:08 1blackbar

https://github.com/nicolai256/Stable-textual-inversion_win

That’s a different method of achieving a similar result… I believe the OP was talking about resuming training of the SD model itself. I am also very interested in this especially in light of the Dreambooth paper: https://dreambooth.github.io/ (I think it would be very interesting to try this approach with SD). There’s training code and settings for latent diffusion but I’m not sure if it would be fruitful to try it with stable especially without knowing the training parameters that were used.

Aug 28 '22 07:08 janekm

Theres this but its ported and requires a beefy 48gb gpu https://github.com/Jack000/glid-3-xl-stable

Aug 28 '22 17:08 chavinlo

Theres this but its ported and requires a beefy 48gb gpu https://github.com/Jack000/glid-3-xl-stable

Oh but that is exciting! I'll have to give it a try. (my naive theory is to try something similar to the dreambooth paper, by trying to find a prompt word that is basically unknown to SD, and then using that as the training captions for some new images)

Aug 28 '22 17:08 janekm

Theres a user that managed to get "full model train with validation" on a 3090, but now 54gb of ram is needed. If he/she releases the code i will lyk @janekm

Aug 29 '22 07:08 chavinlo

https://github.com/nicolai256/Stable-textual-inversion_win

That’s a different method of achieving a similar result… I believe the OP was talking about resuming training of the SD model itself. I am also very interested in this especially in light of the Dreambooth paper: https://dreambooth.github.io/ (I think it would be very interesting to try this approach with SD). There’s training code and settings for latent diffusion but I’m not sure if it would be fruitful to try it with stable especially without knowing the training parameters that were used.

Is there any approach to get the source code of dreambooth or will google provide web service about it like midjourney?

Sep 01 '22 08:09 wangyue-gagua

I'm also looking for some training code for this repo (either to train from scratch or to fine-tune). Could anyone point me in the right direction?

Sep 10 '22 23:09 nihirv

Since Textual Inversion was already mentioned, it's worth mentioning here that the "Dreambooth" paper technique has been implemented on top of Stable Diffusion (and has advantages in many scenarios where someone might think of finetuning the model directly): https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

Sep 11 '22 14:09 janekm

Thank you for pointing me to that @janekm! However what I'm looking to do is condition the model on another image. I.e. I want to feed it 2 images (instead of image+text) and use the second image as a condition. I've been thinking of just replacing the CLIP feature with the embedding of the second image instead of the text embedding, but I think this'll require me to actually fine-tune a diffusion model instead of using textual inversion

Sep 11 '22 15:09 nihirv

信已收到，祝你天天快乐.

Sep 11 '22 15:09 RexSi

This repo has been doing "traditional" fine-tuning training on top of stable diffusion, so may have the code that you are looking for (the CompVis repo also has training code in main.py but I've seen reports that it doesn't work out of the box): https://github.com/harubaru/waifu-diffusion (train.sh should be the entry-point)

Sep 11 '22 15:09 janekm

Thank you for pointing me to that @janekm! However what I'm looking to do is condition the model on another image. I.e. I want to feed it 2 images (instead of image+text) and use the second image as a condition. I've been thinking of just replacing the CLIP feature with the embedding of the second image instead of the text embedding, but I think this'll require me to actually fine-tune a diffusion model instead of using textual inversion

Hello, may I ask how you solved this issue later on? I also want to use an image as conditional input to guide the generation of a new image. Are there specific training methods for this?

Jul 02 '25 08:07 bowen099