DreamArtist-sd-webui-extension Any successful result replication?

Hey guys, I am just wondering if anyone has successfully replicated the 1 image embedding and recreated similar results from 7eu7d7? Right now I have no luck testing it myself.

Training time for the embedding takes around 2.5 hours on my 3090 GPU for 8000 steps. Some results only resemble a bit of that 1 training image.

Nov 15 '22 22:11 bycloudai

Haven't had any great results myself, but that is testing on my own style, which seems to be adjacent to most of the models.

It is odd that you're at 2.5 hours, I am at 40 minutes per 4000 steps.

For replication it may also matter that Xformers is enabled for Textual inversion, that may make it impossible to replicate 1-to-1 if it were done.

Edit, Adding comparison images

Original 00001-0-Tau Karma Good Square (20220605063437)

What it generates 00014-1909859057

Nov 15 '22 23:11 78Alpha

I've tried with 15 images and 5 images and not had any success with it learning a person. I think there are far too many variables that are not explained on what they should be set.

Nov 15 '22 23:11 JaredS215

The embedding can influence image generation, but fails to replicate my waifu. Need some more testing probably.

Nov 16 '22 00:11 henryvii99

I used 22 images but they are basically one image flipped and cropped into different sizes and with some small alternates, for example, with or without a hat. I got decent results using 6 vectors for both positive and negative prompt, learning rate 0.0025, cfg scale 3, reconstruction loss weight 1, negative lr weight 1, custom prompt template that does not use filewords, image size 384x384 and 10000 steps.

Nov 16 '22 00:11 zhupeter010903

I used 22 images but they are basically one image flipped and cropped into different sizes and with some small alternates, for example, with or without a hat. I got decent results using 6 vectors for both positive and negative prompt, learning rate 0.0025, cfg scale 3, reconstruction loss weight 1, negative lr weight 1, custom prompt template that does not use filewords, image size 384x384 and 10000 steps.

I tried without filewords in the template and it heavily copied the backgrounds for the subject to the point that sometimes it would just generate a landscape without any people at all. That was with 6000 steps or so before I canceled.

Nov 16 '22 00:11 JaredS215

I used 22 images but they are basically one image flipped and cropped into different sizes and with some small alternates, for example, with or without a hat. I got decent results using 6 vectors for both positive and negative prompt, learning rate 0.0025, cfg scale 3, reconstruction loss weight 1, negative lr weight 1, custom prompt template that does not use filewords, image size 384x384 and 10000 steps.

I tried without filewords in the template and it heavily copied the backgrounds for the subject to the point that sometimes it would just generate a landscape without any people at all. That was with 6000 steps or so before I canceled.

My image has a simple white background so maybe that's why. Also 7eu7d7 once mentioned that this algorithm works best when clip is set to 1 instead of 2, which most people seems to use nowadays. I've always used clip 2, but maybe you can check that.

I should also mention that even though the negative prompt seems to be able to improve the general quality, it may also cause mosiac or distorted shapes in background and distorted hands in my example.

Edit: And I should include that I didn't use filewords only because it always throw an error when I tried to.

Nov 16 '22 00:11 zhupeter010903

I tried training with a single image, I don't see any obvious visual improvements weirdly.

Every parameter here is the same except for using Anythingv3 for the model Clip skip 1, anythingv3 vae, no hypernetwork, xformer on, reconstruction on

the prompt template file is [name] with purple eyes and purple hair wearing a purple kimono outfit standing in a field of flowers with a purple sword in her hand and a purple butterfly flying around her, by Masaaki Sasamoto

training image is the character Raiden Shogun from Genshin

my preview results every 500 steps: https://imgur.com/a/3qe6Fb3 my loss: https://imgur.com/a/unEBSD2

Nothing resembles the training image, most notably feature was that the subject is standing but most preview is sitting

Attempting to replicate Nahida rn since working on my own has failed various times.

Nov 16 '22 02:11 bycloudai

I tried training with a single image, I don't see any obvious visual improvements weirdly.

Every parameter here is the same except for using Anythingv3 for the model Clip skip 1, anythingv3 vae, no hypernetwork, xformer on, reconstruction on

the prompt template file is [name] with purple eyes and purple hair wearing a purple kimono outfit standing in a field of flowers with a purple sword in her hand and a purple butterfly flying around her, by Masaaki Sasamoto

training image is the character Raiden Shogun from Genshin

my preview results every 500 steps: https://imgur.com/a/3qe6Fb3 my loss: https://imgur.com/a/unEBSD2

Nothing resembles the training image, most notably feature was that the subject is standing but most preview is sitting

Attempting to replicate Nahida rn since working on my own has failed various times.

This is just my experience, but you probably shouldn't include any feature you want the model to learn in the template.

Nov 16 '22 02:11 zhupeter010903

tried it today with myself, got similar results. some generations would just render a landscape and others a lot of identity loss (3/6 vectors, 0.005 lr, 3000 steps)

Nov 16 '22 02:11 knoopx

For those who can generate valid images in logs but fail to replicate in txt2img, say if your file is called zzzz1234 your prompt is "art by zzzz1234", not just the file name. You can see this prompt during training.

Nov 16 '22 06:11 henryvii99

I tried again training with a single image, in an attempt to replicate the results of ani-nahida that 7eu7d7 have shown on README. I used the exact image to train. I named this embedding ani-greengod.

Every parameter is the same except for using Anythingv3 for the model; clip skip 1, anythingv3 vae, no hypernetwork, xformer on, reconstruction on, and also included the suggestion from @zhupeter010903 where I use an empty template

The results definitely don't look like the provided ani-nahida.pt that 7eu7d7 provided.

my preview results for every 500 steps: https://imgur.com/a/LyQqYmM the results of my replication attempt: https://imgur.com/a/nUxRnjh

Here's a comparison with only using the TI without any other text prompt xy_grid-0057-50-5-None-DDIM-3360683758-6569e224-20221116151144

Nov 16 '22 20:11 bycloudai

Same for me 5000 step and all I got is something that resemble the girl but nothing of the detail like in the readme. If this not gonna work then is quite sad honestly.

Nov 17 '22 04:11 cookedfuwan

Same for me 5000 step and all I got is something that resemble the girl but nothing of the detail like in the readme. If this not gonna work then is quite sad honestly.

Guys...I've gotten it to work GREAT...even for mario video game case and cool aliens by hr giger, giving me similar photos!

Take a look at my results. I have very VERY good ones HOWEVER I THINK all I can post here is Safe For Work photos? YOU gotta see them! Anyway let me know here is just one for show! HMU! input pic: https://ibb.co/hZrs0jq results!: SIMILAR IMAGES: https://ibb.co/dkkW0qy VERY HQ IMAGES KIND OF SIMILAR https://ibb.co/nC85xGp

all you do is go here: https://www.kaggle.com/code/miolovers1/stable-diffusion-automatic1111 and replace the upper !git clone line with this: !git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui the link appears in 2nd last cell believe it or not, it brings you to the app!

Nov 17 '22 05:11 bladedsupernova

Ok now see above comment, I had to edit is lots sorry haha. Let me know you have seen this comment else maybe you saw it before editing...

oh and the link appears in 2nd last cell believe it or not, it brings you to the app!

Nov 17 '22 05:11 bladedsupernova

oh and the link appears in 2nd last cell believe it or not, it brings you to the app!

Nov 17 '22 05:11 bladedsupernova

ani-g | animefull-latest | | 3, 10 | 1500 | 0.003 | 5

The original image in the github 00000-0-g

My closest result to the final output 00001-3129058567

Stuck in a painting style for all of them.

Following the S*, light blue hair, forest, blue butterfly, cat ears, flowers, dress example using the same model as well

Nov 17 '22 06:11 78Alpha

See my message above, my results look good.

Nov 17 '22 06:11 bladedsupernova

One of my best HD 3D ones (3D!!), its img2img from 2D: https://ibb.co/XXpP9mN I am adding prompts to get results. Etc.

What is interesting? It does seem to be using images, see her hands up the same way in 2 pics? IDK, maybe this could be the parameters are sameish though. But, you can get it really modified, notice she is much different in another photo though! Different clothes etc!

Nov 17 '22 06:11 bladedsupernova

@bladedsupernova do you mind sharing your exact parameters in training the embeddings?

Nov 17 '22 07:11 bycloudai

The original instructions are quite rough and you may not use DreamArtist correctly. You can use it following the new instructions.

Nov 17 '22 07:11 IrisRainbowNeko

The original instructions are quite rough and you may not use DreamArtist correctly. You can use it following the new instructions.

Is there any suggested value for the reconstruction loss weight and negative lr weight?

Edit: and also for template files?

Nov 17 '22 07:11 zhupeter010903

The original instructions are quite rough and you may not use DreamArtist correctly. You can use it following the new instructions.

Is there any suggested value for the reconstruction loss weight and negative lr weight?

Edit: and also for template files?

reconstruction loss weight and negative lr weight can be set to 1.0. In fact, you can get decent results without reconstruction in most cases, adding the reconstruction can be much slower and the improvement is not very huge.

template files better to use the verison without filewords.

Nov 17 '22 08:11 IrisRainbowNeko

@bladedsupernova do you mind sharing your exact parameters in training the embeddings?

I didn't train it to get those great results. I simply used the link I gave is all. I do set it to 150 (steps/refinement thingy), and play a bit with the CFG 7 up/down. And put in a long prompt, and neg prompt.

Nov 17 '22 08:11 bladedsupernova

I tried again training with a single image, in an attempt to replicate the results of ani-nahida that 7eu7d7 have shown on README. I used the exact image to train. I named this embedding ani-greengod.

Every parameter is the same except for using Anythingv3 for the model; clip skip 1, anythingv3 vae, no hypernetwork, xformer on, reconstruction on, and also included the suggestion from @zhupeter010903 where I use an empty template

The results definitely don't look like the provided ani-nahida.pt that 7eu7d7 provided.

my preview results for every 500 steps: https://imgur.com/a/LyQqYmM the results of my replication attempt: https://imgur.com/a/nUxRnjh

My ani-nahida and ani-nahida-neg is trained on animefull-latest, but training with anythingv3.0 should also get good results. You can try to refer to the new instructions.

Nov 17 '22 08:11 IrisRainbowNeko

@7eu7d7 someone that trained some really good TI actually suggested anythingv3 doesn't do as good as using a mix model between 0.2anime-full&0.8wd, will try both. Just wanna make sure, when you said no filewords, I'm assuimg the PTF is empty? Thanks for your great work!

Nov 17 '22 08:11 bycloudai

no filewords

Just use style.txt or subject.txt. Train with filewords may excluded the described features.

Nov 17 '22 08:11 IrisRainbowNeko

@7eu7d7 I still have a few questions as I have yet to achieve much that can be called a success.

Is the prefix attached to the Embedding Name in the sample to avoid duplication with other nouns and to give a unique noun?
In the sample ani-nahida, embedding length is set to (3, 6) and cfg scale to 3. Is there a guideline for determining these values?
What are the advantages of enabling the "Train with reconstruction" option?

Thanks for developing these great features.

Nov 17 '22 09:11 tsukimiya

no filewords

Just use style.txt or subject.txt. Train with filewords may excluded the described features.

Hi, does this conclusion apply to the original TI?

Because I've seen similar conclusions in https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/1528#discussioncomment-4044422

It's really a very counter-intuitive conclusion, which makes it hard to believe. But through my test I found that it seems to be correct.

Does this mean that almost all tutorials nowadays are wrong about filewords ......?

Nov 17 '22 16:11 kou201

I had a decent run in replication, however the results doesn't look like the training image of Nahida.

This is the .pt that 7eu7d7 provided xy_grid-0064-30-7-None-Euler a-1662428442-6569e224-20221117183202

This is the .pt that I generated xy_grid-0065-30-7-None-Euler a-1662428442-6569e224-20221117183224

I follow the exact instructions that @7eu7d7 suggested, the only thing that is different is probably just the model that was trained on. I speculate this difference might be due to the model I trained based on, so will do it again.

Also, is the training image of Nahida 512x512 or did you throw the uncropped image that is 1417x1417? Currently I am using 1417x1417 that I downloaded from this github.

Nov 17 '22 23:11 bycloudai

no filewords

Just use style.txt or subject.txt. Train with filewords may excluded the described features.

Hi, does this conclusion apply to the original TI?

Because I've seen similar conclusions in https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/1528#discussioncomment-4044422

It's really a very counter-intuitive conclusion, which makes it hard to believe. But through my test I found that it seems to be correct.

Does this mean that almost all tutorials nowadays are wrong about filewords ......?

From my understanding, filewords should describe attributes and elements of the corresponding image which you do not want the TI to learn. For example, if you want to learn an character, and you have an image of the character sitting on meadow in front of a forest, then you can include "siting, meadow, forest" in filewords.

In a simple experiment, I used an image cropped into different sizes as the dataset, and my filewords are simply portrait, cowboy shot, full body, etc. describing the portion of the character visible in the image, and it outperforms the plain subject.txt case.

Nov 18 '22 00:11 zhupeter010903

DreamArtist-sd-webui-extension DreamArtist-sd-webui-extension copied to clipboard

Any successful result replication?

DreamArtist-sd-webui-extension
DreamArtist-sd-webui-extension copied to clipboard