[diffusers]: Model Cache
Model loading is significantly different with diffusers, and I'm not sure how best to integrate it with the existing ModelCache: https://github.com/invoke-ai/InvokeAI/blob/c607d4fe6ccb0f13056d5fee5346989255fe3ccd/ldm/invoke/model_cache.py#L29-L37
diffusers takes advantage of 🤗accelerate by default. I don't know much about that library, but it seems like the sort of "offload this model state to CPU until we need it again" stuff ModelCache is doing is already implemented there: Dispatching and Offloading Models. I hope this means we can drop a lot of the existing code from ModelCache.
I haven't been using ModelCache's "offload to cpu" functionality even on the main branch because it always took way more memory than I expected it to and quickly summoned the Out-Of-Memory Killer.
I do have fast storage and I don't have a ton of spare RAM, so I don't think I'm the target audience for the model caching/offloading feature and I need to delegate the ModelCache/diffusers integration to someone who properly appreciates it.
Some of this potential integration with accelerate could probably even be done on the main branch. But due to the fact that #1583 already changes the ModelCache file a fair bit, and the fact that the way we interact with accelerate is probably a little different with and without diffusers (because it already sets it up somewhat), I expect a PR for this should target dev/diffusers and not main.
There is going to be a question at some point "is the default location for models in $INVOKE_ROOT/models or the huggingface-hub cache?"
I don't care that much one way or the other, but I have a few points in favor of "leave them in the huggingface-hub cache":
- People have said "wouldn't it be great if there were a standard default location all programs used?" If you let huggingface-hub manage your resources, there is!
Thwarted argument in favor:
- I was going to say "you already are using the huggingface-hub cache for all the support models from
transformers", but now I see that since I originally looked atpreload_models, those have all been rewritten to explicitly pass acache_dirto theirfrom_pretrainedcalls.
Argument against:
- a significant number of people need this stuff to be on a specific filesystem or drive, and we can't assume they set up
$HUGGINGFACE_HUB_CACHE.- on the other other hand, maybe we should configure
$HUGGINGFACE_HUB_CACHEfor them as part of our process, because if they don't wantinvokeai/modelson their home filesystem, they probably don't want the hub cache there either.
- on the other other hand, maybe we should configure
I think this has convinced me that (a) yes, use the huggingface hub cache and also (b) manage the setting of $HUGGINGFACE_HUB_CACHE.
though I realize that is going to run smack up against the "all of the things are always under $INVOKEAI_ROOT" philosophy so it might not fly.
There is going to be a question at some point "is the default location for models in
$INVOKE_ROOT/modelsor the huggingface-hub cache?"I don't care that much one way or the other, but I have a few points in favor of "leave them in the huggingface-hub cache":
- People have said "wouldn't it be great if there were a standard default location all programs used?" If you let huggingface-hub manage your resources, there is!
Thwarted argument in favor:
- I was going to say "you already are using the huggingface-hub cache for all the support models from
transformers", but now I see that since I originally looked atpreload_models, those have all been rewritten to explicitly pass acache_dirto theirfrom_pretrainedcalls.Argument against:
a significant number of people need this stuff to be on a specific filesystem or drive, and we can't assume they set up
$HUGGINGFACE_HUB_CACHE.
- on the other other hand, maybe we should configure
$HUGGINGFACE_HUB_CACHEfor them as part of our process, because if they don't wantinvokeai/modelson their home filesystem, they probably don't want the hub cache there either.I think this has convinced me that (a) yes, use the huggingface hub cache and also (b) manage the setting of
$HUGGINGFACE_HUB_CACHE.though I realize that is going to run smack up against the "all of the things are always under
$INVOKEAI_ROOT" philosophy so it might not fly.
It's actually not a problem. We already download a bunch of hugging face models that usually go into its cache and just redirect them into $INVOKEAI_ROOT. There' s a well-established pattern for this.
I think we've worked this out. There's a system in place for deciding where models go on disk, and the model manager is able to offload diffusers models to system memory when it swaps them out.
(Using accelerate didn't work out so well, see #2542.)