diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Dream artist issue 1290

Open isamu-isozaki opened this issue 2 years ago • 2 comments

This is a work-in-progress pr for issue #1290 for dream artist. The dream artist model seems to be the same as the textual inversion model for the most part but with a few changes

  1. Negative embeddings are combined with regular embeddings in the course of training
  2. The loss function can be a convnext discriminator or an l1 loss
  3. Autocrop is used to get an essential part of the images before training.

This is the first version I pulled from my repo but I'll work on improving it+seeing how it does when compared to textual inversion. It would probably not work yet but I'll try fixing that over time.

The code is mainly taken from the textual inversion example but I did add some qol changes which I used for my personal project. Will def remove some to simplify this pr.

isamu-isozaki avatar Nov 26 '22 05:11 isamu-isozaki

The documentation is not available anymore as the PR was closed or merged.

Can you please tell me the progress for now? Because I also want to try to use dreamartist, and I can help if needed.

theSha1do1w avatar Dec 20 '22 09:12 theSha1do1w

@theSha1do1w Hellooo. Sorry for the delay, I'll push what I have by the end of the day! The main issue was that the training result wasn't going well but I think I fixed that bug. Let me know if you have that bug too once I push!

isamu-isozaki avatar Dec 20 '22 19:12 isamu-isozaki

Just updated the code to the code in my branch. I'll train it tomorrow!

isamu-isozaki avatar Dec 21 '22 03:12 isamu-isozaki

Thanks for your push! I think the EMA(Exponential Moving Average) is an important part for dream artist, you can check it in the EMAModel in train_text_to_image.py. I'll try to add it from you branch.

theSha1do1w avatar Dec 21 '22 06:12 theSha1do1w

@theSha1do1w great point! I'll try training today with that and put results here

isamu-isozaki avatar Dec 21 '22 16:12 isamu-isozaki

And feel free to add too! I think i still have one more change to add for getting the original from scheduler but then im pretty done for implementation

isamu-isozaki avatar Dec 21 '22 16:12 isamu-isozaki

@theSha1do1w hello! I did the ema thing and I'll test if it works. Do you want to be added as a contributor to my repo or like do a pr? Honestly happy with either.

isamu-isozaki avatar Dec 22 '22 04:12 isamu-isozaki

Training without ema for now. In the morning will put the results here.

isamu-isozaki avatar Dec 22 '22 04:12 isamu-isozaki

Thanks! But in my tests here, the performance of dream artist is not as good as that of dream booth and it takes more training time and more complicated hyperparameter settings. Neg prompt maybe a good idea but for it need a new way of training I think. Looking forward to your results!

theSha1do1w avatar Dec 22 '22 07:12 theSha1do1w

@theSha1do1w yeah I think I got the same results in a way. I was trying to fine-tune with pictures of a dog below 1 but I keep getting results like media_images_samples_6414_b1eaea2c5919232bee9a but it might be my implementation is a bit wrong. I did notice some difference is how the tokens are initialized. But I think it boils down to 2 points

  1. The initializer token won't generate the class you want because the negative tokens would stop that from happening at least in the beginning
  2. During training, in the original implementation, they are trained as "A photo of positive token" and "A photo of negative token" which might be unstable since the model might be trying to go away from the "A photo" part too.

isamu-isozaki avatar Dec 22 '22 15:12 isamu-isozaki

I'll try training again with the fix. I'll also lower the lr

isamu-isozaki avatar Dec 22 '22 16:12 isamu-isozaki

Slight update. It seems like if you add some noise times 1e-3 to the initial token(in this case dog) for the negative token and the usual initial token for the dog, it generates a dog for some reason. I'll try asking the authors why this is

media_images_samples_399_e6876c35ddd35f4fd6dc

Also from the paper, it seems like we can do something like having a negative token start with a different initial token than the positive one for example, "a photo of a dog" and "a photo of a cat" can be a starting point.

isamu-isozaki avatar Dec 22 '22 18:12 isamu-isozaki

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 16 '23 15:01 github-actions[bot]

Hi. I wanted to test this out with multiple tokens so I'm mainly working on this pr to be merged first

isamu-isozaki avatar Jan 16 '23 15:01 isamu-isozaki

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 25 '23 15:02 github-actions[bot]

@patrickvonplaten Ah, yup very true. I wasn't able to get good results anyway so happy to close this pr.

isamu-isozaki avatar Mar 01 '23 16:03 isamu-isozaki

Haha, just noticed I forgot to .gitignore a lot of files. Let me know if anyone's interested in working on this pr and I can clean up more

isamu-isozaki avatar Mar 01 '23 16:03 isamu-isozaki

cleaned up a bit. But yeah, let me know if anyone is still interested!

isamu-isozaki avatar Mar 01 '23 16:03 isamu-isozaki

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.