stable-diffusion-webui-embedding-editor
stable-diffusion-webui-embedding-editor copied to clipboard
Possibility of outputting the values to CSV for analysis on excel?
I been testing out the sliders and been able to achive some results. Such as being able to make background disappear, or change the age of the subject (however irrelevant it was to the actual embedding being trained); being able to make legs disappear or appear.
However it would be interesting to do deepers analysis of the values and sliders function along with being able to track their changes when training the embedding.
I don't know much about how to go about this, but I'd assume since it is possible to get the values to the sliders it would be possible to get them to a file.
Thus far this has actually been REALLY interesting thing to use. I encourage further development of it. Only suggesting I'd give as another feature if you need an idea is to gives us the ability to set the sliders to the seed that is highlighted. Since going about it by hand is lumbersome. My limited testing does give me the feeling that this would be a really powerful adjustment tool.
Yeah currently it's not super easy to track what each slider does, though it's possible that they only really do things in combination with other sliders. One possible method for now would be to go through each slider copying the weight, changing the weight, creating an output, and then restoring the weight and moving to the next one, to create 768 images (one for each slider). It would be very slow though.
One of my goals is to do that automatically, and maybe detect how much impact it had on the image to then print out a ranking of the weights from most impactful to least.
Another idea is to create a master slider which pulls all the sliders towards another embedding's weights (e.g. to start moving from puppy to skunk, since an embedding halfway between each embedding creates a very valid puppy-skunk hybrid creature).
Right so the power of this tool is quite something.
I made an empty embedding with Init text of: TestiTestiTestiTesti and vector 1. This generated some sort of snowscape with a car. I fiddle around with the weights and decided to make something that apparently is really hard for an AI. A person wearing a diaper. So... 4 hours of testing sliders later I managed to achieve that. However the line between a male and female looking subject is this on the sliders. All sliders affect the image DIRECTLY. However they are not obvious.
I was more succesful with this manually than training an embedding.
Now I got bit bored around the 3 hour mark and there wasn't enough stuff on netflix to justify going.
Oh that's interesting, the way you've used different tag words has proven useful in the way I was hoping.
The initial embedding of vector size 1 would probably only have contained the embedding for the word 'test', and the rest would have been lost due to not having enough vectors. However a lot of concepts can be done in one vector if you find the right spot (as you've shown), and working with more than one vector would make it significantly harder with more weights to manage.
It could be a good idea to generate 2 or 3 images at once with a low step count as well (10-20), because even with embeddings which normally work, stable diffusion can often output wildly different results.
Many of the initial image descriptions used to train stable diffusion were pretty bad, which is why a lot of poses often give opposite results to what you want, and probably the same for a person wearing a diaper, so just finding a better input embedding can give much more reliable results.
I tried with 2 vectors, on an embedding I trained at 256x256 for 10.000 steps. Which is actaully a legit way to start the base frame work for a embedding I have learned, since it avoid "unneccesary" detail from coming up, and as you get the lower resolution to match what you want it to learn, you can increase the resolution. This is the only only way I have managed to train ANYTHING worth a damn with TI.
The reason I switched to 1 vector was since I wanted to get a test done; and I also forgot the init words of the embedding and I couldn't be arsed try to figure them out again. However with my brief test with the 2 vector embedding was equally succesful. After basically reducing all the value to about as close I could get them with a mouse I basically got exactly the base image I was expecting it to make if I had prompted that word straight up in the text2img. Even if the init words were not those.
How ever during the testing I started to notice some patterns along these lines. A set number of weights seemed to correspong controlling primarily what happens in a specific area. These did influence the inputs all together. But I noticed that if I played around with weights that very bottom of the list it affected things like legs mostly - if there was a human subject. And somewhere in the middle around 200-300 marks I started to notice that changes in the values affect the appearance of arm being held on the waist as in the arm started to form as a half a circle. Then some affected clearly the height of the the belly button, and others whether the subject had a shirt, bra, cleavage, a hand, or such at the top of the frame. You can see these in the examples I showed. Somewhere around the 100 marks I noticed that changes affected either side of the butt (in this case). For a long time I struggeld to get rid of what I could only decribe as a butthole/vagina/dick forming on the crotch area, and that is when I added a guide mark for "butt" which later I changed to "belly".
But every slider seems to adjust a certain portion and the pattern in that region. Changes in can very clear or not obvious at all. HOWEVER in the generation process you can see that adjusting them kind pulls the whole picture with them. Example many different sliders when adjusted changed the subject from facing towards the camera or having their back turned.
So it is clear that these weight don't adjust context, but rather what it fetches to fill those areas. Example when I started the path to making the person wearing a diaper, I for the longest time had a baby on a bed. After fiddling with weight at really low down in the list I managed to get the bed to disappear and suddenly the subject that was vaguely human looking was what I could describe as "standing". Then I decieded to test them out by doing 3 rows from the top and 3 rows from the bottom, generating image at every slider (since it took only like half a second). If you look at the attached image, the top right was obvious. Once I got the red horizontal to stop pulling anything the bed disappeared. Once I adjusted the pink horizontal the subject managed to get legs and apear standing. Removing them quickly turned the subject to a sort of a... old man with a baby body wrapped with white blanket in to a burrito looking thing. Or a baby but rendered to shape of an egg. The center vertical line was obvious what sliders did to it, however where it started on the weights and ended I couldn't say. Orange trianlge from bottom left is the slider I noticed to adjust belly button since it affected everything else near it as it moved. Bottom right has the circles that I sorta noticed had a thing. I suspect there were more big cirecles such as a half a arcking from each corner, but can't say for sure. This is the "circle" that adjusting seems to make hands appear. The smaller 4 circles where for butt/hips and shoulder/breast. and green in the middle seemed to... exist.
What makes this particularly hard is that if you happened to find a weight which you thought to adjust the red vertical line in this picture example then some slider seemd to in a way rotate the subject within it's influece in 3D space. I noticed this when I started to get something that appaered to be like a flat nipple on the right (where the hand is on the chest on some example) it rotated in to the rendering on a vertical axis from one slider and horizontal on other. Much like as if you'd skew it in photoshop or rotate a 2D drawing fo something front of you.
Then again I only tested this with one seed. However after I ran a quick patch of like 100 images and pictures which were about what the one I was optimising, were decent in the sense that similar elements in similar way existed in them. However if the perspective or other context changed in the picture then the results were just as random. However vaguely what could be described as having a "man/person" and "diaper". So it is clear that some slider do affect the actual formation of those things overall, or that if something vaguely like that appear in the middle of the picture (and it was always in the middle) the it seemed to get pulled in as it was. However the prompt didn't "spread" to fill the image in that situation.
So it is clear we can control things. I suspect it would be fairly easy to eliminate things from embedding. Such as if you'd try to train it to have a specific T-shirt, you could erase hands or the head from showing up as part of the embedding. Since it was clear that I was able to basically just erase legs from existence easily.
A bit a bit more since I didn't address you saying that I might be generated the same person wearing a diaper. This is something I have noticed in other things. I often keep getting basically the same faces when prompting characters. For all practical purposes I'd consider these as "the same person", or at the very least brothers. These were fetched i a massive batch of images with Craig Davison, Leyendecker and mucha as the style influence. In the set of about 1000 pictures of different themes and seeds (since I do things in big patches and refine them later) I'd say there was 6 unique faces for both men and boys. While for female subject I'd say at best 4 overall, just slightly different 4 generic model/pretty woman faces. This might be me just noticing it since I do like 1000 images in a batch and then go throught them.
Hrm I'm not sure I follow what you mean with the colored shapes and the circles. One seed might give wildly different results to other seeds though, so testing it all with only one seed might make it hard to judge the true impact of various weights.
Something you could try is 'a picture of a man wearing a EmbeddingName', and then also have markers for other types of underwear (ideally if they're only 1 vector long, which you can check in the txt2img prompt window by seeing how the count changes when you add that word). If those markers have places which are similar, they might indicate the concept of clothing worn in that area, whereas the things which are different might indicate texture, volume, etc. It might take moving several weights to cause a specific change though.
Since there are multiple valid embeddings to get a given result, it's also possible that the embeddings for other types of clothing aren't all in the same area at all, but just manage to activate the same parts of the model due to how they work in combination. I'm not sure how the CLIP model was trained, but given that the puppy/skunk middle point worked so well, it seemed like there must be some coherent meanings to these sliders across all embeddings.
What I tested last night indicated more that putting the slider between two different things you want results more in blending of those two than mixing. This leads to "ghosts" of these elements hanging about. Like translucent shirts and such appearing. Generally the way to replicate the concept from the original model, such as diaper as I used in this example is to take that token and aim ALL the slider as close to it as you can. This way you will get exactly the "purest" example of that token from the AI. Which in this case was some sort of a mess of skin, a baby face(s), on a "bed", with white blankets enveloping it.
However since particularly I had a male subject in a diaper as a target; due to the reason that female subject was something that seemed to default from the "diaper" easier than male. After adding them I realized that in some sliders you have to aim at the cluster of the most relevant tokens, while in others if the picture starts to overshoot you need go to the opposite tokens. Or sometimes just default the slider to 0, or either extreme. What is also particularly curious is that not all tokens appear on all the sliders.
What might be wort testing is that making the interface for these better, as in the slider bigger. Then list all the "wrong" tokens that you want to steer away from, or the "right" tokens you want to steer towards, or both somehow and manually adjust the embedding away from the bad tokens.
Because in clothing CLIP confuses constantly underwear, underpants, brief(y) (It wants to add y as an extra token to briefs for some reason), panties, "cloth wrapped around the waist", and shorts. For example. I know this since I have spent a lot of time trying to get subject to be in correct kind of clothing. There are plenty of other similar issue that I don't go in to here now.
More testing is needed, also a better interface to do the testing with since those small sliders are hard to play with efficiently and the weight list hard to navigate.
However it would be nice to get the excel output for the weights AND then relevant token targets on to that excel also. This way it would be really easy to target and find the average of the two tokens with simple equations on excel.
This is a really interesting discussion. If we had a 768 X 768 grid we could patch clusters of 'semantic meaning' together and have another set of sliders to raise/lower the values/outputs. The Reaper DAW has a conceptually similar patch bay for audio.
I asked the dev about a csv export from Embedding Inspector yesterday. https://github.com/tkalayci71/embedding-inspector/issues/10
I copied the slider floor/ceiling values out of your Embedding Editor HTML source code for a female character embedding into Excel this morning so I could do some plots. (I'm going to raise a ticket about Floor/Ceiling values - I'm not sure if they are relevant or just a GUI thing)
With the other guy's help it works!
# save hack in embedding_editor.py Line 253 in the def save_embedding_weights
with open('emb_tensors.txt', 'w') as f:
f.write(str(weights))
f.close()