stable-diffusion.cpp
stable-diffusion.cpp copied to clipboard
[Feature] Sequential model loading
Feature Summary
The ability to load models one by one (only load one model at a time when the calculation needs it) to reduce memory usage.
Detailed Description
I'm working on an android library leveraging llama.cpp and stablediffusion.cpp for easy on device inference but I'm memory limited for some models where the encoder, vae and UNet are being loaded at the same time, would it be possible to add sequential model loading where a model is only loaded when it is needed to only have one model loaded at a time, to drastically reduce the memory footprint.
Alternatives you considered
No response
Additional context
No response
Possibly a duplicate of #908 .