Sana
Sana copied to clipboard
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
``` 25-01-05 17:35:47 - [Sana] - INFO - Load checkpoint from /home/linjl/zzc/Sana_1600M_1024px_MultiLing.pth. Load ema: False. 2025-01-05 17:35:47 - [Sana] - WARNING - Missing keys: ['pos_embed'] [rank0]: Traceback (most recent call...
Thanks for your work, are you gonna train Control Nets for this model?
People asking me to further reduce VRAM usage. Currently 1K model uses 8.7 GB minimum with VAE offloading. If we could do inference at FP8 that would reduce VRAM usage...
Amazing work. An info about the tokenizer, in the tokenizer and embedded code there are references about Qwen2-0.5B-Instruct and 1.5B; I wanted to know if there have been any tests...
Hi. I setup a Sana training session with one 4090 GPU on a PC, everything was fine so I moved the config and the checkpoint to a PC with 7...
Hi, Reading the readme i see that the dataformat seems to look like a uncompressed webdataset. Is it possible to keep the tars uncompressed? Otherwise it's hard for inode usage....
Could you please tell me how to use non-square images for training in this project or the original project? The official example only has the ImgDataset type, but according to...
Implemented caching for VAE embeddings and local bucketing support. I decided not to implement caching for text embeddings because they consume an excessive amount of disk space, and the text...
clip
Could you please try training Sana together with CLIP, similar to how it's done in SDXL? I experimented with fine-tuning Sana on CLIP embeddings (I modified the caption channels), and...