SD-Latent-Interposer icon indicating copy to clipboard operation
SD-Latent-Interposer copied to clipboard

I cannot believe this doesn't have more stars and attention !!! FREAKIN AWESOME

Open PurpleBlueAloeVera opened this issue 2 years ago • 19 comments

Thank you so much for this. Do you plan on adding SD2.1 to the features ?? We have some solid 2.1 models that we'd love to user as refiners for 1.5, or SDXL in our workflows.

Awesome, seriously thank you for sharing this awesome tool !

PurpleBlueAloeVera avatar Oct 07 '23 03:10 PurpleBlueAloeVera

Glad you like it. It was just a quick one-off idea I had, and there's definitely improvements that could be made if I had the time.

As for using 2.1, Last time I checked 1.5 and 2.X latents were compatible, so technically "v1" in the dropdown is "v1/v2". That means it should work fine with 2.1 models without any changes.

city96 avatar Oct 07 '23 04:10 city96

Nah I had tried before. SD 2.X and 1.5 are incompatible in the latent space :/

EDIT: NVM, you're right. It actually works. I don't know if it's optimal AS is. But very surprising to see that the latent space is compatible with 1.x Damn.

PurpleBlueAloeVera avatar Oct 07 '23 23:10 PurpleBlueAloeVera

Just double checked. Stable diffusion v2.1 and v2.0 both come with the same VAE, which is "ft-mse-840000" - the same one people usually use with SDv1.5. This means it is not only compatible, it's 100% the same latent format as far as I can tell. Sure, the model might be more sensitive to the noise the interposer adds, but an improved xl->v1 interposer would also mean improvements to xl->v2. And to reiterate, there's still plenty of room for improvement. I could probably even take the slightly less scuffed architecture from my latent upscaler and apply it here, though I'd like to design something better once I figure out more about how all this neural network stuff works :P

840K VAE commonly used with 1.5 SD2.1 VAE from the official repository File is the same. SHA256: a1d993488569e928462932c8c38a0760b874d166399b14414135bd9c42df5815

city96 avatar Oct 08 '23 02:10 city96

@city96 Thanks for your response! If I may ask, how could this be improved by the way ? (The interposer), also, would there be a way to do this with a single .safetensors that could keep some kind of "merge" of a 1.5 model with an SDXL one ? Or would that be absolutely impossible.

Thanks in advance for your time

PurpleBlueAloeVera avatar Oct 10 '23 08:10 PurpleBlueAloeVera

how could this be improved by the way

Well, the neural network part would have to be changed. Currently it's just a bunch of random conv2D layers that look like a spaceship. I think I have an idea on how to make a better one but yeah, time...

The other thing that needs changing is the dataset, but I think I got a decent one I can re-use from the upscaler. Which means the only other thing I'd need is, again, time to work on this :P

would there be a way to do this with a single .safetensors

You mean combining the v1->xl and xl->v1 models into a single file? I mean, that's easy enough to do I guess... You can store multiple models in the same safetensor file just fine.

city96 avatar Oct 10 '23 14:10 city96

How would you do that ? Store multi-models in one single safetensor file ? :o

PurpleBlueAloeVera avatar Oct 10 '23 17:10 PurpleBlueAloeVera

Same way SD does it. safetensor files just store pairs of keys:values (in this case the values are the network weights). You can just add a prefix to all the keys so you can grab the ones you need while loading.

For example, all of the stable diffusion checkpoint files will have a bunch of keys starting with "first_stage_model" - that's the VAE. Similarly, CLIP and the actual UNET are also stored in the same file just with different prefixes.

I'd probably do something like this if I had to put both interposer models in the same file:

import torch
from safetensors.torch import load_file, save_file

v1_to_xl = load_file("v1-to-xl_interposer-v1.1.safetensors")
xl_to_v1 = load_file("xl-to-v1_interposer-v1.1.safetensors")

out_dict = {}
for k,v in v1_to_xl.items():
	out_dict[f"v1_to_xl.{k}"] = v
for k,v in xl_to_v1.items():
	out_dict[f"xl_to_v1.{k}"] = v

save_file(out_dict, "interposer-v1.1.safetensors")
List of keys before/after

xl->v1 keys:

dict_keys(['sequential.0.bias', 'sequential.0.weight', 'sequential.2.bias', 'sequential.2.weight', 'sequential.4.bias', 'sequential.4.weight', 'sequential.6.bias', 'sequential.6.weight'])

v1->xl keys:

dict_keys(['sequential.0.bias', 'sequential.0.weight', 'sequential.2.bias', 'sequential.2.weight', 'sequential.4.bias', 'sequential.4.weight', 'sequential.6.bias', 'sequential.6.weight'])

combined output keys:

dict_keys(['v1_to_xl.sequential.0.bias', 'v1_to_xl.sequential.0.weight', 'v1_to_xl.sequential.2.bias', 'v1_to_xl.sequential.2.weight', 'v1_to_xl.sequential.4.bias', 'v1_to_xl.sequential.4.weight', 'v1_to_xl.sequential.6.bias', 'v1_to_xl.sequential.6.weight', 'xl_to_v1.sequential.0.bias', 'xl_to_v1.sequential.0.weight', 'xl_to_v1.sequential.2.bias', 'xl_to_v1.sequential.2.weight', 'xl_to_v1.sequential.4.bias', 'xl_to_v1.sequential.4.weight', 'xl_to_v1.sequential.6.bias', 'xl_to_v1.sequential.6.weight'])

Then you just split off the ones you actually need while loading with a startswith or lambda or w/e.

city96 avatar Oct 10 '23 17:10 city96

@PurpleBlueAloeVera Figured I'd ping you, I re-trained the whole thing with a new architecture. It should work a lot better now for both xl->v1 and v1->xl. It still has some hue/saturation issues but overall it's an improvement.

I'd appreciate it if you could re-test using it with SDv2.x models as well, since that was one of the things you said worked sub-par.

INTERPOSER_V3

city96 avatar Oct 11 '23 04:10 city96

It looks indeed a LOT better here! Well done. And for sure, I'll try this a.s.a.p and get back to you. Btw, no problem, don't hesitate to ping me if you'd like me to test/feedback! I'm loving this thing you brought. :)

PurpleBlueAloeVera avatar Oct 11 '23 07:10 PurpleBlueAloeVera

Q: how could this allow people to port SD1.5 LoRAs into SDXL? or is it strictly a Checkpoints thing?

TomLucidor avatar Nov 28 '23 08:11 TomLucidor

Q: how could this allow people to port SD1.5 LoRAs into SDXL? or is it strictly a Checkpoints thing?

I guess issue #1 kind of explains how you can do that. That's the only real way you can use v1 LoRAs with xl, but obviously it won't work for concept LoRAs, only character/style ones.

260507548-955d6a7f-eca0-4aaa-a1fe-a1229a13744f

city96 avatar Nov 28 '23 16:11 city96

I'm just getting past the beginner stage of ComfyUI/Stable Diffusion in general and this process is exactly what I'm looking for. I've tried many ways of installing this, files all seem to be in order in directories, I just can't find a workflow. Dropping the .png into ComfyUI doesn't work. I must be missing something very obvious. Any help from anyone to get this to work would be greatly appreciated. Thanks!

Benzene82 avatar Jan 03 '24 00:01 Benzene82

@Benzene82 You mean the one in the image above? It's just a demo workflow but if you want it then here's the JSON metadata for it. Good thing I never delete anything lol. Feel free to reply if you got any questions.

city96 avatar Jan 03 '24 02:01 city96

Thanks so much for the fast response! I wasn't looking for *this specific *workflow, just couldn't figure out any. I can confirm the node works as expected. I'll test out other Models and LoRAs to learn more about how it works. I'm trying to get away from LoRAs that basically stamp what they are trained on into a subject, making copycat images. I thought blending 1.5 LoRAs with SDXL ones might add some 'variety' and possibly more realism. Simply blending the latent images makes a weird hybrid and I hope this node and process delivers better results. If you could send a link to the latest workflow, V3, I'd greatly appreciate it. I'm not familiar with the Github or Hugging Face cloning of repos, just enough to be dangerous. LOL

On Tue, Jan 2, 2024 at 6:38 PM City @.***> wrote:

@Benzene82 https://github.com/Benzene82 You mean the one in the image above? It's just a demo workflow but if you want it then here's the JSON metadata for it https://github.com/city96/SD-Latent-Interposer/files/13815292/SDXL_T2.json. Good thing I never delete anything lol. Feel free to reply if you got any questions.

— Reply to this email directly, view it on GitHub https://github.com/city96/SD-Latent-Interposer/issues/3#issuecomment-1874782145, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARSQ2XCGN4IZP67RN2JIRLDYMTABJAVCNFSM6AAAAAA5WT7QQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZUG44DEMJUGU . You are receiving this because you were mentioned.Message ID: @.***>

Benzene82 avatar Jan 03 '24 04:01 Benzene82

I'll test out other Models and LoRAs to learn more about how it works. I'm trying to get away from LoRAs that basically stamp what they are trained on into a subject, making copycat images.

Yeah, that can be annoying. I mostly use my own LoRAs now but a lot of the civitai ones are overtrained like that and need a really low weight to even work, if they work at all and aren't completely incompatible lol.

I guess you could look into controlnet or do what I tend to do, which is generate your base image with one model (in this case SDXL) and then img2img it (directly pass the latent via this node) on 1.5 at a high enough denoise. 0.5+ is recommended for this IMO. Some more esoteric stuff might work like canny edge/openpose detect the output from SDXL and using that for an input for 1.5.

I thought blending 1.5 LoRAs with SDXL ones might add some 'variety' and possibly more realism. Simply blending the latent images makes a weird hybrid and I hope this node and process delivers better results.

This node is just meant to replace the need to do a VAE dencode/encode between SDXL/SDv1, though people have used it for some more crazy stuff, like returning the leftover noise from XL and denoising it on v1 with the advanced KSampler. I guess you could convert the SDXL latent with this node and then pipe it into a latent composite/blend node together with the v1 one.

If you could send a link to the latest workflow, V3, I'd greatly appreciate it.

There's no "official" workflow for this repo. I don't really use SDXL anymore (switched to PixArt alpha for the initial image for my new stuff) but here's one of my old workflows for SDXL. It isn't very good but maybe it'll work as a starting point for you?

Link to workflow JSON

SDXLWF

city96 avatar Jan 03 '24 05:01 city96

@city96 what about object LoRAs? How would you get around with SD1.5 to SDXL?

a lot of the civitai ones are overtrained like that and need a really low weight to even work, if they work at all and aren't completely incompatible

That is also a concern, if that is the case what would be the strategy of LoRA cleaning with human-in-the-loop? RLHF/PPO or some other alternative that reduces the amount of human judgement on quality?

TomLucidor avatar Jan 03 '24 09:01 TomLucidor

what about object LoRAs? How would you get around with SD1.5 to SDXL?

Masking/inpainting I guess? Maybe using a similar enough placeholder object for XL?

LoRA cleaning

Not sure what you mean. How well a LoRA works will heavily depend on what model you use it with, so there's no universal "best" weight for a given LoRA. It could work perfectly with the model it was trained on while failing miserably if the model it's applied to is different enough (As an extreme example, run a regular 1.5 LoRA on DPO or TokenCompose and see how well that turns out lol).

You also won't immediately see a pattern if it's ovetrained, so it might take a bit to realize it's just spitting out variations of the training images. Detecting this would be pretty hard as you'd need some sort of similarity score over a large batch of samples.

If you mean LoRA dataset cleaning, that's out of scope for this repo.

city96 avatar Jan 03 '24 17:01 city96

If you mean LoRA dataset cleaning

More like generating and filtering data from a LoRA and further refining them to be more "accurate" by human feedback (random image X is more accurate as "synthetic data" than random image Y). For "smelling" overtraining I am not quite sure if there are ways to make things better (rediscover an optimal weight, human feedback etc.)

TomLucidor avatar Jan 11 '24 14:01 TomLucidor

A bit of a side note but X-Adapter might be just as useful https://github.com/showlab/X-Adapter

TomLucidor avatar Oct 16 '24 01:10 TomLucidor