DreamArtist-sd-webui-extension
DreamArtist-sd-webui-extension copied to clipboard
Any successful result replication?
Hey guys, I am just wondering if anyone has successfully replicated the 1 image embedding and recreated similar results from 7eu7d7? Right now I have no luck testing it myself.
Training time for the embedding takes around 2.5 hours on my 3090 GPU for 8000 steps. Some results only resemble a bit of that 1 training image.
Haven't had any great results myself, but that is testing on my own style, which seems to be adjacent to most of the models.
It is odd that you're at 2.5 hours, I am at 40 minutes per 4000 steps.
For replication it may also matter that Xformers is enabled for Textual inversion, that may make it impossible to replicate 1-to-1 if it were done.
Edit, Adding comparison images
Original
What it generates
I've tried with 15 images and 5 images and not had any success with it learning a person. I think there are far too many variables that are not explained on what they should be set.
The embedding can influence image generation, but fails to replicate my waifu. Need some more testing probably.
I used 22 images but they are basically one image flipped and cropped into different sizes and with some small alternates, for example, with or without a hat. I got decent results using 6 vectors for both positive and negative prompt, learning rate 0.0025, cfg scale 3, reconstruction loss weight 1, negative lr weight 1, custom prompt template that does not use filewords, image size 384x384 and 10000 steps.
I used 22 images but they are basically one image flipped and cropped into different sizes and with some small alternates, for example, with or without a hat. I got decent results using 6 vectors for both positive and negative prompt, learning rate 0.0025, cfg scale 3, reconstruction loss weight 1, negative lr weight 1, custom prompt template that does not use filewords, image size 384x384 and 10000 steps.
I tried without filewords in the template and it heavily copied the backgrounds for the subject to the point that sometimes it would just generate a landscape without any people at all. That was with 6000 steps or so before I canceled.
I used 22 images but they are basically one image flipped and cropped into different sizes and with some small alternates, for example, with or without a hat. I got decent results using 6 vectors for both positive and negative prompt, learning rate 0.0025, cfg scale 3, reconstruction loss weight 1, negative lr weight 1, custom prompt template that does not use filewords, image size 384x384 and 10000 steps.
I tried without filewords in the template and it heavily copied the backgrounds for the subject to the point that sometimes it would just generate a landscape without any people at all. That was with 6000 steps or so before I canceled.
My image has a simple white background so maybe that's why. Also 7eu7d7 once mentioned that this algorithm works best when clip is set to 1 instead of 2, which most people seems to use nowadays. I've always used clip 2, but maybe you can check that.
I should also mention that even though the negative prompt seems to be able to improve the general quality, it may also cause mosiac or distorted shapes in background and distorted hands in my example.
Edit: And I should include that I didn't use filewords only because it always throw an error when I tried to.
I tried training with a single image, I don't see any obvious visual improvements weirdly.
Every parameter here is the same except for using Anythingv3 for the model
Clip skip 1, anythingv3 vae, no hypernetwork, xformer on, reconstruction on
the prompt template file is [name] with purple eyes and purple hair wearing a purple kimono outfit standing in a field of flowers with a purple sword in her hand and a purple butterfly flying around her, by Masaaki Sasamoto
training image is the character Raiden Shogun from Genshin
my preview results every 500 steps: https://imgur.com/a/3qe6Fb3 my loss: https://imgur.com/a/unEBSD2
Nothing resembles the training image, most notably feature was that the subject is standing but most preview is sitting
Attempting to replicate Nahida rn since working on my own has failed various times.
I tried training with a single image, I don't see any obvious visual improvements weirdly.
Every parameter here is the same except for using Anythingv3 for the model
Clip skip 1, anythingv3 vae, no hypernetwork, xformer on, reconstruction on
the prompt template file is
[name] with purple eyes and purple hair wearing a purple kimono outfit standing in a field of flowers with a purple sword in her hand and a purple butterfly flying around her, by Masaaki Sasamoto
training image is the character Raiden Shogun from Genshin
my preview results every 500 steps: https://imgur.com/a/3qe6Fb3 my loss: https://imgur.com/a/unEBSD2
Nothing resembles the training image, most notably feature was that the subject is standing but most preview is sitting
Attempting to replicate Nahida rn since working on my own has failed various times.
This is just my experience, but you probably shouldn't include any feature you want the model to learn in the template.
tried it today with myself, got similar results. some generations would just render a landscape and others a lot of identity loss (3/6 vectors, 0.005 lr, 3000 steps)
For those who can generate valid images in logs but fail to replicate in txt2img, say if your file is called zzzz1234 your prompt is "art by zzzz1234", not just the file name. You can see this prompt during training.
I tried again training with a single image, in an attempt to replicate the results of ani-nahida that 7eu7d7 have shown on README. I used the exact image to train. I named this embedding ani-greengod.
Every parameter is the same except for using Anythingv3 for the model; clip skip 1, anythingv3 vae, no hypernetwork, xformer on, reconstruction on, and also included the suggestion from @zhupeter010903 where I use an empty template
The results definitely don't look like the provided ani-nahida.pt that 7eu7d7 provided.
my preview results for every 500 steps: https://imgur.com/a/LyQqYmM the results of my replication attempt: https://imgur.com/a/nUxRnjh
Here's a comparison with only using the TI without any other text prompt
Same for me 5000 step and all I got is something that resemble the girl but nothing of the detail like in the readme. If this not gonna work then is quite sad honestly.
Same for me 5000 step and all I got is something that resemble the girl but nothing of the detail like in the readme. If this not gonna work then is quite sad honestly.
Guys...I've gotten it to work GREAT...even for mario video game case and cool aliens by hr giger, giving me similar photos!
Take a look at my results. I have very VERY good ones HOWEVER I THINK all I can post here is Safe For Work photos? YOU gotta see them! Anyway let me know here is just one for show! HMU! input pic: https://ibb.co/hZrs0jq results!: SIMILAR IMAGES: https://ibb.co/dkkW0qy VERY HQ IMAGES KIND OF SIMILAR https://ibb.co/nC85xGp
all you do is go here: https://www.kaggle.com/code/miolovers1/stable-diffusion-automatic1111 and replace the upper !git clone line with this: !git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui the link appears in 2nd last cell believe it or not, it brings you to the app!
Ok now see above comment, I had to edit is lots sorry haha. Let me know you have seen this comment else maybe you saw it before editing...
oh and the link appears in 2nd last cell believe it or not, it brings you to the app!
oh and the link appears in 2nd last cell believe it or not, it brings you to the app!
ani-g | animefull-latest | | 3, 10 | 1500 | 0.003 | 5
The original image in the github
My closest result to the final output
Stuck in a painting style for all of them.
Following the S*, light blue hair, forest, blue butterfly, cat ears, flowers, dress
example using the same model as well
See my message above, my results look good.
One of my best HD 3D ones (3D!!), its img2img from 2D: https://ibb.co/XXpP9mN I am adding prompts to get results. Etc.
What is interesting? It does seem to be using images, see her hands up the same way in 2 pics? IDK, maybe this could be the parameters are sameish though. But, you can get it really modified, notice she is much different in another photo though! Different clothes etc!
@bladedsupernova do you mind sharing your exact parameters in training the embeddings?
The original instructions are quite rough and you may not use DreamArtist correctly. You can use it following the new instructions.
The original instructions are quite rough and you may not use DreamArtist correctly. You can use it following the new instructions.
Is there any suggested value for the reconstruction loss weight and negative lr weight?
Edit: and also for template files?
The original instructions are quite rough and you may not use DreamArtist correctly. You can use it following the new instructions.
Is there any suggested value for the reconstruction loss weight and negative lr weight?
Edit: and also for template files?
reconstruction loss weight and negative lr weight
can be set to 1.0. In fact, you can get decent results without reconstruction in most cases, adding the reconstruction can be much slower and the improvement is not very huge.
template files
better to use the verison without filewords.
@bladedsupernova do you mind sharing your exact parameters in training the embeddings?
I didn't train it to get those great results. I simply used the link I gave is all. I do set it to 150 (steps/refinement thingy), and play a bit with the CFG 7 up/down. And put in a long prompt, and neg prompt.
I tried again training with a single image, in an attempt to replicate the results of ani-nahida that 7eu7d7 have shown on README. I used the exact image to train. I named this embedding ani-greengod.
Every parameter is the same except for using Anythingv3 for the model; clip skip 1, anythingv3 vae, no hypernetwork, xformer on, reconstruction on, and also included the suggestion from @zhupeter010903 where I use an empty template
The results definitely don't look like the provided ani-nahida.pt that 7eu7d7 provided.
my preview results for every 500 steps: https://imgur.com/a/LyQqYmM the results of my replication attempt: https://imgur.com/a/nUxRnjh
My ani-nahida and ani-nahida-neg
is trained on animefull-latest, but training with anythingv3.0 should also get good results. You can try to refer to the new instructions.
@7eu7d7 someone that trained some really good TI actually suggested anythingv3 doesn't do as good as using a mix model between 0.2anime-full&0.8wd, will try both. Just wanna make sure, when you said no filewords, I'm assuimg the PTF is empty? Thanks for your great work!
no filewords
Just use style.txt
or subject.txt
. Train with filewords may excluded the described features.
@7eu7d7 I still have a few questions as I have yet to achieve much that can be called a success.
- Is the prefix attached to the Embedding Name in the sample to avoid duplication with other nouns and to give a unique noun?
- In the sample ani-nahida, embedding length is set to (3, 6) and cfg scale to 3. Is there a guideline for determining these values?
- What are the advantages of enabling the "Train with reconstruction" option?
Thanks for developing these great features.
no filewords
Just use
style.txt
orsubject.txt
. Train with filewords may excluded the described features.
Hi, does this conclusion apply to the original TI?
Because I've seen similar conclusions in https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/1528#discussioncomment-4044422
It's really a very counter-intuitive conclusion, which makes it hard to believe. But through my test I found that it seems to be correct.
Does this mean that almost all tutorials nowadays are wrong about filewords ......?
I had a decent run in replication, however the results doesn't look like the training image of Nahida.
This is the .pt that 7eu7d7 provided
This is the .pt that I generated
I follow the exact instructions that @7eu7d7 suggested, the only thing that is different is probably just the model that was trained on. I speculate this difference might be due to the model I trained based on, so will do it again.
Also, is the training image of Nahida 512x512 or did you throw the uncropped image that is 1417x1417? Currently I am using 1417x1417 that I downloaded from this github.
no filewords
Just use
style.txt
orsubject.txt
. Train with filewords may excluded the described features.Hi, does this conclusion apply to the original TI?
Because I've seen similar conclusions in https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/1528#discussioncomment-4044422
It's really a very counter-intuitive conclusion, which makes it hard to believe. But through my test I found that it seems to be correct.
Does this mean that almost all tutorials nowadays are wrong about filewords ......?
From my understanding, filewords should describe attributes and elements of the corresponding image which you do not want the TI to learn. For example, if you want to learn an character, and you have an image of the character sitting on meadow in front of a forest, then you can include "siting, meadow, forest" in filewords.
In a simple experiment, I used an image cropped into different sizes as the dataset, and my filewords are simply portrait, cowboy shot, full body, etc. describing the portion of the character visible in the image, and it outperforms the plain subject.txt case.