KSampler: use the same noise samples across a batch
Using an ancestral sampler on a batch currently yields results that can only be reproduced at the same index in a batch of the same size. For example, if you want to reproduce a single result from a batch generation, you have to re-run the entire batch.
This change ensures that the same noise samples are used for every latent in the batch, leading to consistent and reproducible results regardless of batch size/order. In addition, the noise is now a view of a C×H×W tensor, which uses less memory than a full B×C×H×W.
This may address some nondeterminism/reproducibility bugs like #2799 or #4876, potentially others.
Probably need to think carefully about the possible ramifications of doing this because it's not necessary straightforward.
One thing is it will make all existing workflows that use batches and ancestral/SDE samplers impossible to reproduce. People often use batches for img2img, inpainting and stuff like FaceDetailer. You'd generally use moderate denoise for those types of applications so having every batch item share a noise with ancestral/SDE samplers might decrease the diversity and make using batches in that case less useful.
This may help with #2799, it won't do anything for #4876 which is kind-of sort-of band-aid fixed with the current generator based approach in the default noise sampler. I say kind-of sort-of because it only fixes samplers that are using a default noise generator and it causes some other problems.
Personally I think I'd like to see this as something I could opt into when necessary or at the least opt out of. This probably could be implemented relatively easy as a separate feature in a sampler wrapper. Example: https://github.com/blepping/ComfyUI-sonar - the SamplerConfigOverride node in py/nodes.py.
One way to make batch sampling consistent without having to generate the same batch (with some caveats) would be to make a sampler wrapper similar to the node I mentioned that takes parameters for batch size and index(s) and just returns those noise batch items. This doesn't avoid the performance cost of generating more noise than is actually used by for the most part only the model call matters and everything else is barely a rounding error for performance/memory use. Just to put it in context, Flux has 16 channels and 8x spatial compression so a 1024x1024 Flux latent is 262,144 items. If they're 32-bit floats then that's 4 bytes each, meaning a batch of 16 1024x1024 Flux latents would come to a whopping... 16MB.
Diversity should come from using different seeds for some part of the process, no? Expecting the same inputs and same seed to give different results because they're batched together feels more like finding a use for a bug than correct behavior to me.
(I'm not super experienced with all of this though, so it's entirely possible that I'm not understanding the intent behind random seeds as used here - I'm more accustomed to seeds being used to ensure reproducibility in a context like unit testing.)
Diversity should come from using different seeds for some part of the process, no?
Well, setting the seed doesn't do anything directly, it's only when you generate random numbers and perturb the process that you get varying results.
Not sure how much you know about this, so if it's stuff you already know I just want to mention I'm not trying to be condescending or anything. There are (generally) two places where the random numbers factor into the process:
- When the initial noise is added to the latent. This is (roughly) going to be
input_latent + noise * first_sigma_in_schedule. For text2image, where there's no existing image,input_latentwill just be a latent full of zeros. - During sampling if the sampler is ancestral or SDE. These samplers (simplifying a bit) overshoot the step slightly (the degree is based on whatever ETA is set to) and then add some new noise so that the end of the step has a noise level matching expected amount for the next step.
The second case applies to your changes, but only ancestral samplers. It seems like you aren't handling SDE which would probably be confusing if this gets merged. Anyway, my biggest concern was how this would change all existing batch generations using ancestral samplers (that use the default noise sampler) and people wouldn't be able to reproduce those workflows anymore. I personally don't think that's something that should be done lightly unless there is a pretty big upside or a way to opt out.
Note that I am just a random node author, I have no authority so you're free to ignore me if you want.