Skimmed_CFG Intuition behind this node

First of all, thank you for this node! It is awesome. I would love to understand the intuition behind the simplest version of this node and what it is doing under the hood. Please correct me if I'm wrong:

Given the guidance scale, the current latents, the current unconditional noise prediction, and the current conditional noise prediction, this node will:

Calculate what would be the noise prediction with classifier free guidance using the normal guidance scale
Figure out where the noise prediction with CFG is overblown by:

2.a) Looking at where the signs of the conditional_noise_prediction match the ones from the difference between the conditional and the unconditional noise prediction
2.b) Looking at where the signs of the conditional_noise_prediction match the ones from the noise prediction with CFG
2.c) (optional) Looking at where the signs of the noise prediction with CFG match the ones from the difference between the noise prediction with CFG and the latents Only when those three conditions coincide, then you determine that the noise pred with CFG is probably overblown

You calculate a "new" "noise prediction with classifier free guidance", but the scale you use is now the skimming scale
For the cases where the conditions are satisfied, you replace the conditional noise prediction with the difference between the original conditional noise prediction and the skimmed noise prediction, scaled by the guidance scale

The part I cannot get an intuition for is why you're doing this also for the unconditional noise prediction with the already-processed tensor, rather than with the original unconditional noise prediction.

Oct 02 '24 01:10 CesarERamosMedina

For your step-by-step explaination, I would rather sum up the process by "values from one prediction which pokes so hard as opposed through the other prediction that it ends up being visible in the denoised result".

And if you can stand my MSPaint and ComicSansMS level of mastery, the base image I had in mind while writing this was more or less that:

So it is not that it is probably overblow in that condition, it is that it is absolutely overblow. But overblow would be a missleading term since what kills it all is the negative pred being upside down in the end result.

And for your question, you mean the second time? If this is about that then it is because, "conceptually", after removing contradictions on one side it then doing so the other way around can not be destructive anymore and allows to avoid having overblow, even if they "going the right way" values.

The whole thing is hard to grasp and if I know it, it is because after writing it I tried to get it only through intuition without visualising the entire process and absolutely could not get any result (I wanted to make sure that I wasn't overcomplicating things).

Oct 06 '24 09:10 Extraltodeus

This all makes sense! Love the MSPaint diagram. Thank you for the reply. When doing this, did you find that working with the noise predictions was better? (as opposed to working with the denoised latents)

Oct 06 '24 16:10 CesarERamosMedina

I think it does not change much at the end.

Oct 12 '24 10:10 Extraltodeus