sd-scripts
sd-scripts copied to clipboard
[WIP] PixArt-Sigma training pipeline
Current state
- [x] Backbone (w/o external deps)
- [x] Model save-loading (orig format)
- [x] T5 support, T5's attention_mask carryover, load in 4bit
- [x] T5 Text embeddings, attention mask caching on disc
- [x] Wrap up Pixart as a NetworkTrainer for use in any train loops
- [x] Add pixart blocks for lora/etc listings
- [ ] Combine T5 and SDXL vae in checkpoint when saving, recommended by FurkanGozukara
- [x] Model inference for sampling
- [ ] Do test launch on base and lora/etc to test compat and debug
- [ ] Test aspect-ratio conditioning
- [ ] Diffusers format save mode and other leftover TODOs
- [ ] Setup a ~~ShareCaption/CogVLM2/LLaVA*/etc.~~ InternLM-XComposer2-4KHD-based multimodal prompt enhancer
- [ ] ControlNet-Transformer support for training and inference (waiting for PA-Sigma's repo code release)
I'm very excited to work with PixArt and its great size/prompt adherence ratio, in addition to the awesome lora techniques in this repo, so if it goes well it should start working in a couple of days
Addresses https://github.com/kohya-ss/sd-scripts/issues/979
PixArt repo: https://github.com/PixArt-alpha/PixArt-sigma
I'm very likely going to edit this post with updates, comments, pictures quite often