stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Feature Request]: Built-in noise offset

Open tkalayci71 opened this issue 1 year ago • 3 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

An option ranging from -1 to +1, which then adds a pure black/white image to latent noise. Currently this can be achieved by externally creating such an image and using it for img2img (denoising strength=1.0) But this is not convenient, especially when you do batch img2img and source image is already used for ControlNet.

Using epiNoiseOffset_v2 Lora does not give the same results:

no noise offset: default

txt2img with <epiNoiseOffset_v2:4> lora4

img2img with pure black : black

txt2img with <epiNoiseOffset_v2:-4> lora-4

img2img with pure white: white

parameters

silk brocade Steps: 10, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 0, Face restoration: CodeFormer, Size: 512x768, Model hash: 20bae33336, Model: realisticVisionV13_v13, Denoising strength: 1, Mask blur: 4, ControlNet Enabled: True, ControlNet Module: none, ControlNet Model: control_openpose-fp16 [9ca67cc5], ControlNet Weight: 1, ControlNet Guidance Start: 0, ControlNet Guidance End: 1

To not break things, when this setting is 0, do nothing (because adding vae-encoded mid-gray is not exactly the same as adding nothing/zero). When it's -1, create and vae-encode a black image and add to initial noise, when it's +1 do white image, etc. This option is both for txt2img and img2img, should be easy to implement after the generating latent noise.

Thank you.

tkalayci71 avatar Mar 19 '23 14:03 tkalayci71

Influencing only the 0th step does not truly achieve what you're after, since the denoiser quickly 'recovers' from that initial bump and keeps doing its best to generate an image which is grey in average. The right way of doing it is to darken/brighten the latent for multiple steps, at the beginning, middle or end of the denoising schedule. I'm making an extension which does that and much more. Here are some examples of what can be achieved this way:

Prompt: a pure white polar bear at full noon daylight on a field of white snow, award-winning wildlife photography antarctica Vanilla: a pure white polar bear at full noon-0 Extension: a pure white polar bear at full noon-13

Prompt: a black panther feline at full moon midnight, award-winning wildlife photography black jungle background, strong rim light moonlight Vanilla: a black panther feline at full moon midnight-2 Extension: a black panther feline at full moon midnight-27 Extension (very dark settings): a black panther feline at full moon midnight-29 Extension (different schedule w/ color correction): a black panther feline at full moon midnight-67

In the above examples, the darkening/lightening starts at around t=0.10 (not from 0, so that the output keeps the same structure and composition as the vanilla output, for comparison) and end at around t=0.50 (which is the parameter that controls how dark/bright the final outcome will really be).

P.S: Examples are made with sd-v1-5-ema with no offset noise LORAs. P.P.S: Color correction is also done during the same process. This can basically be used to bias your output towards any color, not just black and white. P.P.P.S: One more example (middle is vanilla): ex

muerrilla avatar Mar 22 '23 03:03 muerrilla

@zahand Very nice examples, I'll try your extension when it's available. However, I also like the unique results I get with only initial noise offset, you could easily include that mode in your extension. Also, will it work with other scripts like xyz plot, and will options be saved in generation parameters? I hope so. Cheers!

ghost avatar Mar 22 '23 07:03 ghost

@zahand Very nice examples, I'll try your extension when it's available. However, I also like the unique results I get with only initial noise offset, you could easily include that mode in your extension. Also, will it work with other scripts like xyz plot, and will options be saved in generation parameters? I hope so. Cheers!

Thank you. It will be possible to apply the adjustment only at step 0 and it saves the options in the PNG info. Exposing the parameters to other scripts is beyond my knowledge (not really a programmer here), so that will be up to the community. I will release it in the coming days, as soon as I write some documentation for it. I'll let you know.

muerrilla avatar Mar 22 '23 11:03 muerrilla

Influencing only the 0th step does not truly achieve what you're after, since the denoiser quickly 'recovers' from that initial bump and keeps doing its best to generate an image which is grey in average.

Huh? Theoretically, the reason unmodified SD won't generate dark images is that the model assumes the added white noise has zero mean, so the average color of the input image should already be correct.

In my experiments, adding a bias to the initial latent has a drastic effect. If it's pure black (or even darker, by going past the (-1, 1) range the VAE expects) I often get black images back. Make it impossibly white and SD 1.5 likes to hit a corner of the training data distribution. (this is a standalone script for the CompVis codebase, not a webui script)


If you're using webui's img2img feature, that doesn't actually let you run the full schedule, even at 100% it skips one step.

IIRC the img2img code from the original codebase expects this and screws up indexing if you try to use the full schedule with it.


My suggestions:

  1. Enable denoising strength 1 to run all steps. Even if the indexing error I mentioned is (still) in webui, it should be an easy fix.
  2. For quickly adding a noise offset, provide a checkbox and color picker. It's no more work or clutter than a slider (Gradio has a ColorPicker component). A toggle is more convenient than using the slider's center to turn it off, and doing something totally different at 0.01 vs. 0 is confusing.
  3. Hope Gradio adds a paintbucket some time? ツ

drdaxxy avatar Mar 30 '23 02:03 drdaxxy

Huh? Theoretically, the reason unmodified SD won't generate dark images is that the model assumes the added white noise has zero mean, so the average color of the input image should already be correct.

Did you actually read my comment and look at my examples? I know the theory of offset noise very well and have used it in training. The point here was although manipulating the initial latent does have a drastic effect (as demonstrated by OP), is does not have the expected effect, which would be the same effect as using offset noise during training (or using an offset noise LORA at inference time), which is allowing you to create very bright and very dark images, with an acceptable quality.

P.s: I'm interested in seeing your results, but the image in your link is not working.

muerrilla avatar Mar 30 '23 10:03 muerrilla

P.s: I'm interested in seeing your results, but the image in your link is not working.

No, that "Image Temporarily Unavailable" is what SD 1.5 generates (for many seeds) when you make it denoise an impossibly bright image (by running numbers above 1 through the VAE) captioned "A dark alleyway in a rainstorm" :^)

I tried your prompts, including the negative prompts in the PNG info. Didn't hunt for the LoRAs, though, or match webui's RNG.

white_grid

black_grid

Left is my approach (with "normal" white and black), middle is the LoCon approximation of the "Offset Noise" blog post's model, right is vanilla.

Code TL;DR is just standard img2img with K-diffusion, but without skipping steps and using a constant image.

# same as gist
rgb = torch.tensor([r, g, b], device="cuda", dtype=torch.float16)[None,:,None,None].tile(batch_size,1,height,width)
x_T = model.get_first_stage_encoding(model.encode_first_stage((rgb - 0.5) * 2))

# K-diffusion
sigma_min, sigma_max = model_wrap.sigmas[[0,-1]].cpu().tolist()
sigmas = K.sampling.get_sigmas_karras(steps, sigma_min=sigma_min, sigma_max=sigma_max, device="cuda")
x_T = x_T + (sigmas[0] * torch.randn_like(x_T))
x_0 = K.sampling.sample_best_ddpm_sampler_2023_confirmed_working(model_wrap_cfg, x_T, sigmas, extra_args=extra_args)

That said, I just compared this to webui img2img, or skipping one step in my code, and the results I get aren't too different -- I don't see the problem?

Is it that img2img still produces bright spots (like in my panther samples)? In that case you don't just want a low mean -- a picture that's 90% black and 10% white has average RGB (0.1,0.1,0.1) -- but also other statistics like range or variance. Your extension looks good for this - how are you controlling the latents during the process?

Besides color-correcting, lower CFG also helps. These days quality isn't as much of a problem anymore IMO, and the "middle brown" effect you'd usually get is the latent space mean, so img2img helps avoid it (see my first row).

drdaxxy avatar Mar 31 '23 01:03 drdaxxy

No, that "Image Temporarily Unavailable" is what SD 1.5 generates (for many seeds) when you make it denoise an impossibly bright image (by running numbers above 1 through the VAE) captioned "A dark alleyway in a rainstorm" :^)

Haha I had no idea, and didn't even notice the AI-ishness of the image! lol

I tried your prompts, including the negative prompts in the PNG info. Didn't hunt for the LoRAs, though, or match webui's RNG.

Nice, thanks for the examples. Your method surpasses my expectation, which is surprising because I thought I had basically tried the same thing (well not literally, I did it with offsetting the init noise of txt2img, but still it's the same principle, no?) but it never worked as good. Your results are pretty good in the lower CFG range (I mean all of them are still better than the LORA examples IMO), but that's very limiting in the sense of prompt-adherence and overall creativeness of output.

Is it that img2img still produces bright spots (like in my panther samples)? In that case you don't just want a low mean -- a picture that's 90% black and 10% white has average RGB (0.1,0.1,0.1) -- but also other statistics like range or variance. Your extension looks good for this - ...

Yes. That's kinda it. I'm not sure the model has any bias towards a specific variance etc., but that it's still the bias towards a grey mean which is at work here. So the model is trying to reach a 0 mean image at every step, but the amount of denoising per step is little. So if you manipulate the latent to be darker than average at one step, the sampler can't really overcome it in that step, but give it more steps and it will eventually do its best to create an image with a 0 mean. But, since at the early steps the overall latent was dark (forced by you), the larger features of the image (the obvious one being the mean of the image) have been already decided to be dark, so the sampler overcompensates by making smaller features way brighter, getting the mean closer to 0. Hope I explained that well enough.

Also as you mentioned, with this method you can only push the latent to a certain point before you "break it". So you can't always make it as dark or as bright as you want without wiping out the whole thing.

... how are you controlling the latents during the process?

My method is super simple. Make a schedule that starts from 1 and goes to 0 over the sampling steps, and use this as a multiplier for applying the darkening/brightening/coloring(yes!) at every sampling step (done inside the denoiser callback). Now make that schedule more interesting than that, and you get more interesting results.

Here's an example of what my schedule looked like in my examples above. Note that the schedule start is delayed so that the very first steps of sampling are the same as the unmodified version, so that it doesn't affect the structure of the output compared to the original unmodified output. 2023-03-31 18_19_32-011070

Besides color-correcting, lower CFG also helps. These days quality isn't as much of a problem anymore IMO, and the "middle brown" effect you'd usually get is the latent space mean, so img2img helps avoid it (see my first row).

I avoid the middle brown by first encoding a real grey image, which gives me a "grey latent" (which is not an all zeros tensor). I subtract that from the 'x' latent, do the "color" adjustments on 'x', then add it back.

Bonus: You can target specific channels of the 'x' latent (instead of the whole thing) and apply different adjustments to them in order to do crazier stuff like cross-processing effect, contrast, saturation and color balance adjustments, desharpening of images that become too sharp in the last steps, etc.

muerrilla avatar Mar 31 '23 14:03 muerrilla

but still it's the same principle, no?

Yeah. Generally, text2img and img2img are the same except img2img (normally) skips ahead in the schedule and adds an image to the initial input. Webui's tab might do some preprocessing, but my code doesn't use that, so we should be talking about the exact same thing.

...Unless the functionality you're using to edit the initial latent sits between sampler and model, not before the sampler.

In that case, as far as the sampler's concerned, your process is part of the (one-step noise prediction) model and only changes the predictions. I dunno what APIs webui has here, but that could be your problem.

So the model is trying to reach a 0 mean image at every step

The model predicts white noise with 0 mean and time-dependent variance at every step. All it does during training is one step of img2img at a time, effectively, for a random strength level and without CFG. "Offset noise" training modifies the noise distributions to have random means, and others even throw out the rule that it's Gaussian. Without these, the only reason the process "tries" to generate 0-mean images in standard text2img is that standard text2img starts with nothing where the source image should be.

CFG complicates things, but even with low/no CFG you can still get bright shades in a dark image and I don't think that's always overcompensating. Pictures with a flat background are usually compositions, not untouched photos. Try adding a small amount of noise to the pixel-space input, or generally something more like a real image.

drdaxxy avatar Apr 01 '23 23:04 drdaxxy

Yo I had actually implemented perlin noise in dreambooth some time ago, but couldn't really interpret what its effect was (besides that of the offset noise) and then I forgot about it. Seemed to make it train kinda faster or something. Wanna help figure this thing out? 😁 here: https://www.reddit.com/r/StableDiffusion/comments/11d9p2j/comment/jai61wb/?utm_source=share&utm_medium=web2x&context=3

muerrilla avatar Apr 02 '23 02:04 muerrilla