fast-stable-diffusion icon indicating copy to clipboard operation
fast-stable-diffusion copied to clipboard

Has something been changed?

Open Deexaw opened this issue 2 years ago • 51 comments

Hi, I have been training faces on Realistic Vision model for about a week and results were always good, but today something is wrong, it seems that after loading trained model to stable diffusion it just generates photos based on my instance images and prompts wont work as it was before. I thought that I overtrained it, tried again with 25 images and lesser steps same results, even tried with 10 pics and still the same. I usually train train face with 30-70 photos with 5e6 unet training steps and 1e6 Text Encoder Learning Rate, yesterday I tried with 2e-6 unet and 1e-6 text and results were amazing. Today it just seems broken, even tried to train face over another model, still the same. Anyone had the same problem? Sorry for my eng :) изображение

Deexaw avatar Feb 20 '23 00:02 Deexaw

And what is this? First time see it изображение

Deexaw avatar Feb 20 '23 00:02 Deexaw

And what is this? First time see it изображение

That's just the model downloading

TheLastBen avatar Feb 20 '23 03:02 TheLastBen

Try the latest colab, ghd one you're using might be broken

TheLastBen avatar Feb 20 '23 03:02 TheLastBen

I'm having the same issue today, even with an empty prompt I'm getting only variations of my instance images 🤷🏼‍♂️

Omenizer avatar Feb 20 '23 09:02 Omenizer

Try the latest colab, ghd one you're using might be broken

Well, I always used the latest verison, even checked it today. Tried again and same awfull results. Idk...

Deexaw avatar Feb 20 '23 10:02 Deexaw

Try the latest colab, ghd one you're using might be broken

Hello! The problem is from one of your updates from nowadays. I know 100% that the commit "e59c0fc - fix RAM problem for good, thanks to @Daviljoe193" was working very good. I don't really know what you did after that, but yea, now the model gives only the used images in training (distorted and weird). I think that the solution is very simple. Just revert everything to that last working commit.

Edit: With exactly same settings as before, the models are now a joke.

Oh, yes RAM is fine now! Been having fun with merges again 👍

Omenizer avatar Feb 20 '23 11:02 Omenizer

I'm having the same issue today, even with an empty prompt I'm getting only variations of my instance images 🤷🏼‍♂️

Yes. I'm using the lastest colab and today I have the same problem, the prompt does not work properly. Only generates rare variations of the instance images.

hidecreature avatar Feb 20 '23 19:02 hidecreature

Also having this issue fwiw.

csilv avatar Feb 20 '23 19:02 csilv

As I said, the easiest solution (for Ben) will be to revert everything he did in the last 2 days. I don't know what he did, but even the checkpoint merger doesn't work anymore ("cuda out of memory" error for the same models with which I tried a few days ago when everything worked very well).

I thought I was losing my mind. Glad it wasn't me, though, I have burnt through a lot of Colab compute thinking it was. Should have checked here first!

order661015 avatar Feb 20 '23 21:02 order661015

Since I got mentioned here (I guess), I'll weigh in on this issue.

People say that the last "good" commit was the one where my suggestion for fixing the memory leak was applied. Let's see what happened with those 14 commits from between then and now, commit by commit. Note, I'm NOT a developer here, despite my mention in a few commits, I'm just a clown that happens to look like a developer.

Click to expand, it's a lot

Commit 1

ep1

Pretty easy here. The learning rate for UNet training was lowered from 5e-6 for 1000 steps to 2e-6 for 1500 steps. Realistically, this should reduce any weird artifacting, and reduce the likelyhood of overfitting, at the cost of a small increase in training time.

Commit 2-3

ep2 ep3

Here isn't too hard to explain either. A tarball containing the dependencies for Dreambooth/Automatic1111 was updated to use Python 3.10, instead of Python 3.9, and while the old tarball is still there on Huggingface, there shouldn't be anything here that could break anything.

Commit 4-5

ep4 ep5

So these commits are related, therefore I have to cover them together. First, part of this just surpresses a pointless "warning" that "warns" that there's a cool shiny thing that we could use, but don't need. Second, appears to be (I'm just a guy, not the maintainer, or a Python expert) prefetching a dependency for the webui, and maybe some mild code cleanup, Again, nothing here could break anything.

Commit 6

ep6

That same prefetch from commit 5, but applied to the dedicated SD notebook.

Commit 7

ep7

More guesswork needed here. It seems that all this change does is tell the webui to not attempt to fetch the stuff that commit 5-6 already have gotten.

Commit 8

ep8

Commit 7, but for Dreambooth.

Commit 9

ep9

A fix for an error I could've sworn I've seen on the issues page before, but can't seem to find at the moment. 100% couldn't affect training, though.

Commit 10-12

ep10 ep11 ep12

Self explanitory, adds everything needed for ControlNet to work, and it's only applied to the Automatic1111 notebook, not the Dreambooth notebook.

Commit 13

ep13

Self explanitory, just fixes a problem with resuming training on 768 v2.x models on Dreambooth. Likely not what's causing people trouble here.

Commit 14

ep14

Uh, it's a 4 character change for what url to git clone. Nothing to see here.

Not trying to invalidate what people have said here about having trouble with overfitting, since I too have had a ton of trouble getting anything that's not either overfitted or ugly (Though I still don't fully get Dreambooth's settings, and I haven't trained any models in over a month), but nothing of significance has changed during the last 14 commits.

Daviljoe193 avatar Feb 21 '23 04:02 Daviljoe193

So what do you suggest? How do we train over custom models now? If it's always overfitting. I never had such problem since I started to use it. The last time I trained without any problem was 18feb, then on the next day it's all started

Deexaw avatar Feb 21 '23 05:02 Deexaw

Something was definitely broken.

  • Resuming training for SD2.1-768px based models was throwing an error.
  • Training on SD2.1-512px did not crash, but the result was terrible. The upper 'loss' values varied around 0.5-0.6, which is normal. But the lower 'loss' values were about 1e-4, which is very strange for first training.

ps: I recently ran the training again with the same dataset and parameters. The 'loss' value varies between 1e-1 and 0.7. Looks like the problem is fixed.

iqddd avatar Feb 21 '23 06:02 iqddd

изображение So I tried it again today with 1e6 text encoder, 10 photos. It was a mess. Tried it now with the same photos and 4e7 and it's better

Deexaw avatar Feb 21 '23 08:02 Deexaw

There is no standard settings for all datasets, you have to find the right settings for your dataset.

TheLastBen avatar Feb 21 '23 08:02 TheLastBen

I think it's something with unet, it became more sensitive. Now I used 23 photos with 650 steps only, even text is 1e6 and it's ok, Usually I would use 2300steps for that..

Deexaw avatar Feb 21 '23 09:02 Deexaw

Yes it became more efficient, so you don't need 4000 steps to train on a single subject

TheLastBen avatar Feb 21 '23 09:02 TheLastBen

Yes it became more efficient, so you don't need 4000 steps to train on a single subject

Oh, If only you said that earlier 😄 Where we can read about those updates?

Deexaw avatar Feb 21 '23 10:02 Deexaw

I have finally made it to work as before. Check out this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb Took me some good hours to make it to work, because I'm no coder. Used my logical thinking to revert anything to a stable version. I don't know how to fork a certain commit, so I downloaded the older commit and uploaded on a new repository and modified everything accordingly.

Edit: No more distortions and weird images. Add the name of the .jpg files in the prompts to get more characteristics of your character.

Thanks mate! going to test it today

Deexaw avatar Feb 21 '23 10:02 Deexaw

I have finally made it to work as before. Check out this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb Took me some good hours to make it to work, because I'm no coder. Used my logical thinking to revert anything to a stable version. I don't know how to fork a certain commit, so I downloaded the older commit and uploaded on a new repository and modified everything accordingly.

Edit: No more distortions and weird images. Add the name of the .jpg files in the prompts to get more characteristics of your character.

THANKS!!! IT WORKS AMAZING! Finally, I was trying everything. But the new version always deform faces even with low steps.

yengalvez avatar Feb 21 '23 17:02 yengalvez

Still training SD2.1 give terrible results. 00001-3962062657 What the hell does it look like? Undertraining / overtraining? UNet / text_encoder?

iqddd avatar Feb 21 '23 17:02 iqddd

Hi, I have been training faces on Realistic Vision model for about a week and results were always good, but today something is wrong, it seems that after loading trained model to stable diffusion it just generates photos based on my instance images and prompts wont work as it was before. I thought that I overtrained it, tried again with 25 images and lesser steps same results, even tried with 10 pics and still the same. I usually train train face with 30-70 photos with 5e6 unet training steps and 1e6 Text Encoder Learning Rate, yesterday I tried with 2e-6 unet and 1e-6 text and results were amazing. Today it just seems broken, even tried to train face over another model, still the same. Anyone had the same problem? Sorry for my eng :) изображение

This has been happening to me as well. I trained my model despite working fine a few days ago. These are my results now. This is supposed to be a tiger eating fish in a jungle. 00005-1610653001

00002-1610652998

InternalMegaT avatar Feb 21 '23 17:02 InternalMegaT

Guys, what are your base models? With SD1.5, it probably still works correctly. With SD2.1 (512 or 768), it gets terrible results. Moreover, the larger the dataset size, the worse the resulting model. The picture I gave above is generated by a model trained on 180 photos.

iqddd avatar Feb 21 '23 18:02 iqddd

Hi, I have been training faces on Realistic Vision model for about a week and results were always good, but today something is wrong, it seems that after loading trained model to stable diffusion it just generates photos based on my instance images and prompts wont work as it was before. I thought that I overtrained it, tried again with 25 images and lesser steps same results, even tried with 10 pics and still the same. I usually train train face with 30-70 photos with 5e6 unet training steps and 1e6 Text Encoder Learning Rate, yesterday I tried with 2e-6 unet and 1e-6 text and results were amazing. Today it just seems broken, even tried to train face over another model, still the same. Anyone had the same problem? Sorry for my eng :) изображение

This has been happening to me as well. I trained my model despite working fine a few days ago. These are my results now. This is supposed to be a tiger eating fish in a jungle. 00005-1610653001

00002-1610652998

Just use less unet steps and text, it became more sensetive somehow

Deexaw avatar Feb 21 '23 18:02 Deexaw

It seems to me that the issue is not about the number of steps. In the last example, I used only 30 steps per image. The result is still terrible.

iqddd avatar Feb 21 '23 18:02 iqddd

Guys, what are your base models? With SD1.5, it probably still works correctly. With SD2.1 (512 or 768), it gets terrible results. Moreover, the larger the dataset size, the worse the resulting model. The picture I gave above is generated by a model trained on 180 photos.

I'm using 2.1 (768) and its getting trash results, however 3 days ago it was perfectly fine with the same database and training.

InternalMegaT avatar Feb 21 '23 18:02 InternalMegaT

Hi, I have been training faces on Realistic Vision model for about a week and results were always good, but today something is wrong, it seems that after loading trained model to stable diffusion it just generates photos based on my instance images and prompts wont work as it was before. I thought that I overtrained it, tried again with 25 images and lesser steps same results, even tried with 10 pics and still the same. I usually train train face with 30-70 photos with 5e6 unet training steps and 1e6 Text Encoder Learning Rate, yesterday I tried with 2e-6 unet and 1e-6 text and results were amazing. Today it just seems broken, even tried to train face over another model, still the same. Anyone had the same problem? Sorry for my eng :) изображение

This has been happening to me as well. I trained my model despite working fine a few days ago. These are my results now. This is supposed to be a tiger eating fish in a jungle. 00005-1610653001 00002-1610652998

Just use less unet steps and text, it became more sensetive somehow

I will just rollback a few days I don't want to figure out this new way to train models.

InternalMegaT avatar Feb 21 '23 18:02 InternalMegaT

Test this colab from @Bullseye-StableDiffusion - https://colab.research.google.com/github/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb Just testing it right now

Deexaw avatar Feb 21 '23 18:02 Deexaw