sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

[WIP] PixArt-Sigma training pipeline

Open kabachuha opened this issue 9 months ago • 11 comments

Current state

  • [x] Backbone (w/o external deps)
  • [x] Model save-loading (orig format)
  • [x] T5 support, T5's attention_mask carryover, load in 4bit
  • [x] T5 Text embeddings, attention mask caching on disc
  • [x] Wrap up Pixart as a NetworkTrainer for use in any train loops
  • [x] Add pixart blocks for lora/etc listings
  • [ ] Combine T5 and SDXL vae in checkpoint when saving, recommended by FurkanGozukara
  • [x] Model inference for sampling
  • [ ] Do test launch on base and lora/etc to test compat and debug
  • [ ] Test aspect-ratio conditioning
  • [ ] Diffusers format save mode and other leftover TODOs
  • [ ] Setup a ~~ShareCaption/CogVLM2/LLaVA*/etc.~~ InternLM-XComposer2-4KHD-based multimodal prompt enhancer
  • [ ] ControlNet-Transformer support for training and inference (waiting for PA-Sigma's repo code release)

I'm very excited to work with PixArt and its great size/prompt adherence ratio, in addition to the awesome lora techniques in this repo, so if it goes well it should start working in a couple of days

Addresses https://github.com/kohya-ss/sd-scripts/issues/979

PixArt repo: https://github.com/PixArt-alpha/PixArt-sigma

I'm very likely going to edit this post with updates, comments, pictures quite often

kabachuha avatar May 19 '24 22:05 kabachuha