ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

[Request]: Implement SDXL Latent Fixing tweaks?

Open SLAPaper opened this issue 1 year ago • 6 comments

It's introduced in https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space which gives 3 different settings to correct the SDXL latent space, may be good for testing

SLAPaper avatar Dec 04 '23 17:12 SLAPaper

This is an interesting tweak.

ltdrdata avatar Dec 05 '23 00:12 ltdrdata

pretty sure these are implemented here https://github.com/Clybius/ComfyUI-Latent-Modifiers

TingTingin avatar Dec 05 '23 06:12 TingTingin

They are.
I've been examining both the look and the histograms of the examples (I need to download the node and generate some 16bpc outputs or EXR) and from the point of view of a photographer this technique seems to be hurting the overall "mood" of the images in most cases. I did relative colorimetric conversion to ProPhotoRGB 16bpc integer since the provided JPEGs are completely blown out on the high and low ends of the file. ProPhoto makes sure everything possible is covered but doesn't help viewing since a huge portion is outside the gamut of any display device made or is made up of imaginary colors, but it gets rid of the "treat as sRGB" default for the untagged files. None of them have ICC profiles so I can only guess what the normalized (which I hope is something that happened) input colorspace to the latents was before conversion to their (Lab*? YUV? ITP? I don't use any of the bizarre two color component spaces enough to be familiar based on just the color encodings they talked about so I don't know if some standard color matrix conversion for the 3 non "detail" dimensions was used or if it was just sorta "let's find something that produces pleasing colors".

Unfortunately color management has always been a mess and needs to be a start to finish process... which means there needed to be (and AFAIK wasn't). I only say that because not all the images on the web are sRGB and thanks to Facebook mass-re-assigning the Display P3 (D63) profile to everything and stripping original exif years ago and other sites like pinterest stripping ICC and EXIF there's no way to spider the web and actually get any information about what colorspace anything was supposed to be in. You can't really assume sRGB anymore, there are tons of photos from phones floating around that got untagged somehow and are using a larger space, and it's impossible to detect what something was supposed to be tagged as after it's been un-tagged.

The paper mentions not enough blue in the images and poor white balance but then shows a bunch of examples that shouldn't have an overall white balance that's actually white because of the environment. There were only a couple that looked better on my calibrated monitor after the adjustment. The ones that didn't have the colors shifted to a mismatch ended up losing mood because of flat brightness increases. I shoot with a Sony a7RIV these days and the "Auto" white balance setting has two sub-settings... one to attempt to white balance the image, and one to avoid shifting it too much to capture the actual colors of the image under tinted lighting. I never use the first and switch to other white balances if I'm either using flashes or the camera isn't picking up good balance because of overwhelming color not coming from light sources (a blanket of green foliage in the woods when it's overcast can start shifting things for example)

Without doing some kind of upconversion the histogram is completely clipped in multiple channels and on both ends A good example on the linked blog (after I converted to 16bpc ints) is... especially notice that the overwhelming patches of yellow towards the bright end didn't really change, they just got bigger / more even... that end of the histogram is nearing white anyway so it's just blowing more things out. The blue sort of split up and a huge spike of it migrated into that ultra-bright zone over there: SDXL Original SDXL Original_histogram vs Hard Modification Hard Modification_hist

where it's easiest to see... the original generation was a backlit sunset with warm diffuse yellow light with pink tones from the clouds providing secondary diffuse. The blue of the sky is tempered by atmospheric haze from the low angle of the sun and the front of the subject is somewhat dark because you're just not going to expose an image perfectly in that situation without a fill flash or without blowing out more of the sky than just the sun. Here's a fun output of all 3 channels side by side with corresponding color values (16-bit) for the sky in text for the uncorrected version. The sky is what you'd expect from a bright light cyan leaning towards blue already. The important part is how much darker it looks in the blue section because that's the usual response of the eyes, even though it's brighter than the other two by enough to be visible when they're all viewed as greyscale instead.

01_split_RGB

I'm glad this is being worked towards because IMO there's plenty to be fixed with getting non-blown-out colors and appropriate lighting from nearly every model but something seems off with these, too... or rather there's something wrong with the scenes but the corrections applied weren't the right ones. (See above about not white balancing a sunset with pink skies shot nearly head on). I'm going to mess around with the node

NeedsMoar avatar Dec 09 '23 00:12 NeedsMoar

They are. I've been examining both the look and the histograms of the examples (I need to download the node and generate some 16bpc outputs or EXR) and from the point of view of a photographer this technique seems to be hurting the overall "mood" of the images in most cases. I did relative colorimetric conversion to ProPhotoRGB 16bpc integer since the provided JPEGs are completely blown out on the high and low ends of the file. ProPhoto makes sure everything possible is covered but doesn't help viewing since a huge portion is outside the gamut of any display device made or is made up of imaginary colors, but it gets rid of the "treat as sRGB" default for the untagged files. None of them have ICC profiles so I can only guess what the normalized (which I hope is something that happened) input colorspace to the latents was before conversion to their (L_a_b*? YUV? ITP? I don't use any of the bizarre two color component spaces enough to be familiar based on just the color encodings they talked about so I don't know if some standard color matrix conversion for the 3 non "detail" dimensions was used or if it was just sorta "let's find something that produces pleasing colors".

Unfortunately color management has always been a mess and needs to be a start to finish process... which means there needed to be (and AFAIK wasn't). I only say that because not all the images on the web are sRGB and thanks to Facebook mass-re-assigning the Display P3 (D63) profile to everything and stripping original exif years ago and other sites like pinterest stripping ICC and EXIF there's no way to spider the web and actually get any information about what colorspace anything was supposed to be in. You can't really assume sRGB anymore, there are tons of photos from phones floating around that got untagged somehow and are using a larger space, and it's impossible to detect what something was supposed to be tagged as after it's been un-tagged.

I'm kinda confused here...are you judging its quality based on a few images that you converted color space for?

Because that, if anything, will wreck a picture/image. Relative Colormetric is the best one can do, but a conversion is a conversion is a conversion.

What you are seeing is not at all a good demonstration of the results. Github compresses photos. I just uploaded mine, with all of its metadata – it was a 27mb DNG with DP3 profile. What I got back was a 1.3mb JPG that still had all of the metadata attached to it, including the DP3 profile (or at least the info, I don't know how it is displayed and I'm not a good tester because I have my chromium (experimental) settings to show the best avail profile).

So there's no way of knowing unless you try it yourself. But I'll be testing it and uploading it to Flickr or a cloud that'll preserve everything. I'll be back with a report.

I'd personally love to output 10-bit color images with a DP3 (or better) color profile/dynamic range. This is presuming that OpenAI managed to preserve all the training photos/pictures in their original formatting. Don't know the inner machinations of the GAN used.

Does anyone happen to know what is being output/what is capable of being output? Kinda ignorant of what takes up that 32-bit space, whoops

BuildBackBuehler avatar Dec 11 '23 00:12 BuildBackBuehler

I implemented this in WebUI Extension for testing: https://github.com/SLAPaper/sd-webui-sdxl-latent-tweaking Might try to write a custom node in spare time

SLAPaper avatar Dec 11 '23 06:12 SLAPaper

They are. I've been examining both the look and the histograms of the examples (I need to download the node and generate some 16bpc outputs or EXR) and from the point of view of a photographer this technique seems to be hurting the overall "mood" of the images in most cases. I did relative colorimetric conversion to ProPhotoRGB 16bpc integer since the provided JPEGs are completely blown out on the high and low ends of the file. ProPhoto makes sure everything possible is covered but doesn't help viewing since a huge portion is outside the gamut of any display device made or is made up of imaginary colors, but it gets rid of the "treat as sRGB" default for the untagged files. None of them have ICC profiles so I can only guess what the normalized (which I hope is something that happened) input colorspace to the latents was before conversion to their (L_a_b*? YUV? ITP? I don't use any of the bizarre two color component spaces enough to be familiar based on just the color encodings they talked about so I don't know if some standard color matrix conversion for the 3 non "detail" dimensions was used or if it was just sorta "let's find something that produces pleasing colors". Unfortunately color management has always been a mess and needs to be a start to finish process... which means there needed to be (and AFAIK wasn't). I only say that because not all the images on the web are sRGB and thanks to Facebook mass-re-assigning the Display P3 (D63) profile to everything and stripping original exif years ago and other sites like pinterest stripping ICC and EXIF there's no way to spider the web and actually get any information about what colorspace anything was supposed to be in. You can't really assume sRGB anymore, there are tons of photos from phones floating around that got untagged somehow and are using a larger space, and it's impossible to detect what something was supposed to be tagged as after it's been un-tagged.

I'm kinda confused here...are you judging its quality based on a few images that you converted color space for?

Because that, if anything, will wreck a picture/image. Relative Colormetric is the best one can do, but a conversion is a conversion is a conversion.

Converting to larger spaces and higher bit depths for histogram preview is fairly standard. Lightroom's histogram is based on a mapping to MelissaRGB which is nearly as large as prophoto. I mainly looked at the images first, conversion was to make the hist readable

What you are seeing is not at all a good demonstration of the results. Github compresses photos. I just uploaded mine, with all of its metadata – it was a 27mb DNG with DP3 profile. What I got back was a 1.3mb JPG that still had all of the metadata attached to it, including the DP3 profile (or at least the info, I don't know how it is displayed and I'm not a good tester because I have my chromium (experimental) settings to show the best avail profile).

I was kind of hoping that was the case, I still haven't gotten around to it because some node keeps breaking diffusers installs and I've been building some flash attention wheels for windows and doing running around for the holidays. There is a 32-bit OpenEXR node for comfy, BTW. Presumably --force-fp32 needs to be used to keep precision loss from happening in the latents

imagine you'd get nearly 32bpc usable

most output from SDXL especially looks like it should have been linear or log output that could then be tonemapped down to lower color depth via whatever method looks best then edited with with lightroom / capture one style raw editing tools

So there's no way of knowing unless you try it yourself. But I'll be testing it and uploading it to Flickr or a cloud that'll preserve everything. I'll be back with a report.

I'd personally love to output 10-bit color images with a DP3 (or better) color profile/dynamic range. This is presuming that OpenAI managed to preserve all the training photos/pictures in their original formatting. Don't know the inner machinations of the GAN used.

Does anyone happen to know what is being output/what is capable of being output? Kinda ignorant of what takes up that 32-bit space, whoops

That's so

NeedsMoar avatar Dec 17 '23 16:12 NeedsMoar