diffusers Dream artist issue 1290

This is a work-in-progress pr for issue #1290 for dream artist. The dream artist model seems to be the same as the textual inversion model for the most part but with a few changes

Negative embeddings are combined with regular embeddings in the course of training
The loss function can be a convnext discriminator or an l1 loss
Autocrop is used to get an essential part of the images before training.

This is the first version I pulled from my repo but I'll work on improving it+seeing how it does when compared to textual inversion. It would probably not work yet but I'll try fixing that over time.

The code is mainly taken from the textual inversion example but I did add some qol changes which I used for my personal project. Will def remove some to simplify this pr.

Nov 26 '22 05:11 isamu-isozaki

The documentation is not available anymore as the PR was closed or merged.

Nov 26 '22 05:11 HuggingFaceDocBuilderDev

Can you please tell me the progress for now? Because I also want to try to use dreamartist, and I can help if needed.

Dec 20 '22 09:12 theSha1do1w

@theSha1do1w Hellooo. Sorry for the delay, I'll push what I have by the end of the day! The main issue was that the training result wasn't going well but I think I fixed that bug. Let me know if you have that bug too once I push!

Dec 20 '22 19:12 isamu-isozaki

Just updated the code to the code in my branch. I'll train it tomorrow!

Dec 21 '22 03:12 isamu-isozaki

Thanks for your push! I think the EMA(Exponential Moving Average) is an important part for dream artist, you can check it in the EMAModel in train_text_to_image.py. I'll try to add it from you branch.

Dec 21 '22 06:12 theSha1do1w

@theSha1do1w great point! I'll try training today with that and put results here

Dec 21 '22 16:12 isamu-isozaki

And feel free to add too! I think i still have one more change to add for getting the original from scheduler but then im pretty done for implementation

Dec 21 '22 16:12 isamu-isozaki

@theSha1do1w hello! I did the ema thing and I'll test if it works. Do you want to be added as a contributor to my repo or like do a pr? Honestly happy with either.

Dec 22 '22 04:12 isamu-isozaki

Training without ema for now. In the morning will put the results here.

Dec 22 '22 04:12 isamu-isozaki

Thanks! But in my tests here, the performance of dream artist is not as good as that of dream booth and it takes more training time and more complicated hyperparameter settings. Neg prompt maybe a good idea but for it need a new way of training I think. Looking forward to your results!

Dec 22 '22 07:12 theSha1do1w

@theSha1do1w yeah I think I got the same results in a way. I was trying to fine-tune with pictures of a dog below but I keep getting results like media_images_samples_6414_b1eaea2c5919232bee9a but it might be my implementation is a bit wrong. I did notice some difference is how the tokens are initialized. But I think it boils down to 2 points

The initializer token won't generate the class you want because the negative tokens would stop that from happening at least in the beginning
During training, in the original implementation, they are trained as "A photo of positive token" and "A photo of negative token" which might be unstable since the model might be trying to go away from the "A photo" part too.

Dec 22 '22 15:12 isamu-isozaki

I'll try training again with the fix. I'll also lower the lr

Dec 22 '22 16:12 isamu-isozaki

Slight update. It seems like if you add some noise times 1e-3 to the initial token(in this case dog) for the negative token and the usual initial token for the dog, it generates a dog for some reason. I'll try asking the authors why this is

media_images_samples_399_e6876c35ddd35f4fd6dc

Also from the paper, it seems like we can do something like having a negative token start with a different initial token than the positive one for example, "a photo of a dog" and "a photo of a cat" can be a starting point.

Dec 22 '22 18:12 isamu-isozaki

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 16 '23 15:01 github-actions[bot]

Hi. I wanted to test this out with multiple tokens so I'm mainly working on this pr to be merged first

Jan 16 '23 15:01 isamu-isozaki

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 25 '23 15:02 github-actions[bot]

@patrickvonplaten Ah, yup very true. I wasn't able to get good results anyway so happy to close this pr.

Mar 01 '23 16:03 isamu-isozaki

Haha, just noticed I forgot to .gitignore a lot of files. Let me know if anyone's interested in working on this pr and I can clean up more

Mar 01 '23 16:03 isamu-isozaki

cleaned up a bit. But yeah, let me know if anyone is still interested!

Mar 01 '23 16:03 isamu-isozaki

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Mar 01 '23 16:03 HuggingFaceDocBuilderDev

diffusers diffusers copied to clipboard

Dream artist issue 1290

diffusers
diffusers copied to clipboard