Document Flux2Pipeline latents shape

Open preetam1407 opened this issue 2 weeks ago • 0 comments

Fixes #12755.

This PR documents the expected shape of the latents argument in Flux2Pipeline.__call__.

For the default AutoencoderKLFlux2 VAE used by FLUX.2, the pipeline first applies 8× spatial compression in the VAE, and then a 2×2 patch packing step in the pipeline. This results in:

an effective 16× downsampling in height and width, and
4× more channels in the latent space.

The expected shape for user-provided latents is therefore:

(batch_size, 128, height // 16, width // 16)

where height and width are the requested output image size. Passing latents with a different shape leads to shape mismatches inside the VAE and transformer.

Tests

Docs-only change; no functional behavior modified.
Verified that providing latents of shape (1, 128, H // 16, W // 16) runs end-to-end with the FLUX.2-dev checkpoint.

Dec 08 '25 13:12 preetam1407