Skimmed_CFG icon indicating copy to clipboard operation
Skimmed_CFG copied to clipboard

Intuition behind this node

Open CesarERamosMedina opened this issue 1 year ago • 3 comments

First of all, thank you for this node! It is awesome. I would love to understand the intuition behind the simplest version of this node and what it is doing under the hood. Please correct me if I'm wrong:

Given the guidance scale, the current latents, the current unconditional noise prediction, and the current conditional noise prediction, this node will:

  1. Calculate what would be the noise prediction with classifier free guidance using the normal guidance scale
  2. Figure out where the noise prediction with CFG is overblown by:
  • 2.a) Looking at where the signs of the conditional_noise_prediction match the ones from the difference between the conditional and the unconditional noise prediction
  • 2.b) Looking at where the signs of the conditional_noise_prediction match the ones from the noise prediction with CFG
  • 2.c) (optional) Looking at where the signs of the noise prediction with CFG match the ones from the difference between the noise prediction with CFG and the latents Only when those three conditions coincide, then you determine that the noise pred with CFG is probably overblown
  1. You calculate a "new" "noise prediction with classifier free guidance", but the scale you use is now the skimming scale
  2. For the cases where the conditions are satisfied, you replace the conditional noise prediction with the difference between the original conditional noise prediction and the skimmed noise prediction, scaled by the guidance scale

The part I cannot get an intuition for is why you're doing this also for the unconditional noise prediction with the already-processed tensor, rather than with the original unconditional noise prediction.

CesarERamosMedina avatar Oct 02 '24 01:10 CesarERamosMedina

For your step-by-step explaination, I would rather sum up the process by "values from one prediction which pokes so hard as opposed through the other prediction that it ends up being visible in the denoised result".

And if you can stand my MSPaint and ComicSansMS level of mastery, the base image I had in mind while writing this was more or less that: image

So it is not that it is probably overblow in that condition, it is that it is absolutely overblow. But overblow would be a missleading term since what kills it all is the negative pred being upside down in the end result.

And for your question, you mean the second time? If this is about that then it is because, "conceptually", after removing contradictions on one side it then doing so the other way around can not be destructive anymore and allows to avoid having overblow, even if they "going the right way" values.

The whole thing is hard to grasp and if I know it, it is because after writing it I tried to get it only through intuition without visualising the entire process and absolutely could not get any result (I wanted to make sure that I wasn't overcomplicating things).

Extraltodeus avatar Oct 06 '24 09:10 Extraltodeus

This all makes sense! Love the MSPaint diagram. Thank you for the reply. When doing this, did you find that working with the noise predictions was better? (as opposed to working with the denoised latents)

CesarERamosMedina avatar Oct 06 '24 16:10 CesarERamosMedina

I think it does not change much at the end.

Extraltodeus avatar Oct 12 '24 10:10 Extraltodeus