sd-scripts [WIP] PixArt-Sigma training pipeline

[WIP] PixArt-Sigma training pipeline

Open kabachuha opened this issue 9 months ago • 11 comments

Current state

[x] Backbone (w/o external deps)
[x] Model save-loading (orig format)
[x] T5 support, T5's attention_mask carryover, load in 4bit
[x] T5 Text embeddings, attention mask caching on disc
[x] Wrap up Pixart as a NetworkTrainer for use in any train loops
[x] Add pixart blocks for lora/etc listings
[ ] Combine T5 and SDXL vae in checkpoint when saving, recommended by FurkanGozukara
[x] Model inference for sampling
[ ] Do test launch on base and lora/etc to test compat and debug
[ ] Test aspect-ratio conditioning
[ ] Diffusers format save mode and other leftover TODOs
[ ] Setup a ~~ShareCaption/CogVLM2/LLaVA*/etc.~~ InternLM-XComposer2-4KHD-based multimodal prompt enhancer
[ ] ControlNet-Transformer support for training and inference (waiting for PA-Sigma's repo code release)

I'm very excited to work with PixArt and its great size/prompt adherence ratio, in addition to the awesome lora techniques in this repo, so if it goes well it should start working in a couple of days

Addresses https://github.com/kohya-ss/sd-scripts/issues/979

PixArt repo: https://github.com/PixArt-alpha/PixArt-sigma

I'm very likely going to edit this post with updates, comments, pictures quite often

May 19 '24 22:05 kabachuha

sd-scripts sd-scripts copied to clipboard

[WIP] PixArt-Sigma training pipeline

Current state

sd-scripts
sd-scripts copied to clipboard