[Feature] Sequential model loading

Open Aatricks opened this issue 1 month ago • 1 comments

Feature Summary

The ability to load models one by one (only load one model at a time when the calculation needs it) to reduce memory usage.

Detailed Description

I'm working on an android library leveraging llama.cpp and stablediffusion.cpp for easy on device inference but I'm memory limited for some models where the encoder, vae and UNet are being loaded at the same time, would it be possible to add sequential model loading where a model is only loaded when it is needed to only have one model loaded at a time, to drastically reduce the memory footprint.

Alternatives you considered

No response

Additional context

No response

Nov 16 '25 10:11 Aatricks

Possibly a duplicate of #908 .

Nov 16 '25 11:11 wbruna