Sana icon indicating copy to clipboard operation
Sana copied to clipboard

Sana checkpoint trained with SD-3 VAE

Open srikarym opened this issue 11 months ago • 3 comments

Hi, Thank you for open-sourcing your code and trained models. Could you release the Sana text2image model trained with either SD-XL or SD-3 VAE?

srikarym avatar Dec 04 '24 23:12 srikarym

Yes, DC-AE is fast, but ruined image details, we also try f64 - same result( We try realistic, anime - https://imgsli.com/MzI0MDg3

May be take a look at auraflow vae? it's opensourced as i know, comparable to flux/sd3 vae and significantly better DC-AE

Auraflow vae _auraflow

DC-AE _auraf64

Or may be train small model for converting from DC-AE to AuraFlow in latent space directly, what do you think, is this possible?

recoilme avatar Dec 05 '24 11:12 recoilme

Makes sense. It's surprising that Sana obtains better FID scores with this VAE, despite worse reconstruction results.

From the paper: although AE-F8C16 exhibits the best reconstruction ability (rFID: F8C16<F16C32<F32C32), we empirically find that the generation results of F32C32 are superior

I wish they'd release checkpoints trained with other VAEs, allowing users to choose the one that works best for their specific dataset when fine-tuning.

srikarym avatar Dec 05 '24 16:12 srikarym

@lawrence-cj Do you plan to release Sana trained with f8c4 / f8c16 VAEs?

srikarym avatar Dec 23 '24 00:12 srikarym

@lawrence-cj gentle reminder

srikarym avatar Aug 12 '25 19:08 srikarym