diffusers
diffusers copied to clipboard
Document Flux2Pipeline latents shape
Fixes #12755.
This PR documents the expected shape of the latents argument in Flux2Pipeline.__call__.
For the default AutoencoderKLFlux2 VAE used by FLUX.2, the pipeline first applies 8× spatial compression in the VAE,
and then a 2×2 patch packing step in the pipeline. This results in:
- an effective 16× downsampling in height and width, and
- 4× more channels in the latent space.
The expected shape for user-provided latents is therefore:
(batch_size, 128, height // 16, width // 16)
where height and width are the requested output image size. Passing latents with a different shape leads to shape
mismatches inside the VAE and transformer.
Tests
- Docs-only change; no functional behavior modified.
- Verified that providing latents of shape
(1, 128, H // 16, W // 16)runs end-to-end with the FLUX.2-dev checkpoint.