Sana
Sana copied to clipboard
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Hi, wondering if you have a 0.3B or even smaller version of SANA sprint that could be run continuously with decent FPS for real-time systems. Do you have experiments with...
1. update readme; 2. update CI test code;
Hi, please correct me if I'm wrong. I tried using the inverse function in [DPM-Solver](https://github.com/NVlabs/Sana/blob/47777fbdb245e8584ba829caea3d2326c13b2b50/diffusion/model/dpm_solver.py#L549) to invert the source latent to the noisy latent. After obtaining the noisy latent, I...
Hello, thank you for the great work! Is it possible to provide us with the Dockerfile for setting up the training environment? I have seen the dockerfile for running inference...
Hi authors, Thanks for the brilliant wotk! Do you have any plans on publishing video generation sana models?
Initial implementation of SANA-Sprint training script adapted for Diffusers. This needs further refinement and optimization. @lawrence-cj @sayakpaul
hi, can we use the generated images safely for commercial purpose , for example in video games ? thanks .
Hi, I'm finetuning Sana-1.6M on 4K images (4096×4096) and encountered OOM on an H20-96G GPU during vae_encode. Mixed precision is enabled, batch size = 1. My questions: 1.How much memory...
Great work! For your full table benchmark, the emoji for FID seems to be inverted.
I tried your linear attention module and found that, in attn_matmul, `vk` has extremely large values especially when the sequence is long. I guess it is because your relu kernel...