InvokeAI Model cycling?

I'm not suggesting to load them all at start but It would be nice to keep the models loaded so we can switch between them quickly.

Sep 25 '22 20:09 Neosettler

These models are not light. Even with the heavy optimizations this repo brings in, they still take up 2.1x gigs of memory. Having multiple loaded into memory means you are compromising on everything else.

Not worth it at all imo. Rebooting the script is quite fast. Changing of the models can be done in an instant by swapping the config files.

Sep 25 '22 22:09 blessedcoolant

Yeah, loading a model takes 10 seconds, not worth keeping more than 1 in RAM.

Sep 25 '22 23:09 n00mkrad

I hear you but 2G in ram doesn't even start to worry me one bit. Wasting 10s x time I'm cycle a model really does. Maybe -U parameter as unload model when switching?

Sep 26 '22 06:09 Neosettler

Video memory. Not RAM.

Sep 26 '22 07:09 blessedcoolant

We could keep the models in CPU RAM and move then into VRAM to activate them. There's a low-memory optimization that was just added just today which migrates the models out of VRAM into RAM during image saving operations. We just need to give the user more control over this.

So consider it added to the TODO, but not in the next release.

Sep 26 '22 08:09 lstein

Can confirm that calling model.to("cpu") and model.to("cuda") does indeed work perfectly for quickly switching between two models (tested with Waifu diffusion and Stable diffusion).

I'm working on some code (by invoking ldm.generate.Generate() ) that allows us to have a queueing system behind InvokeAI (as our instance is accessible via various other ways such as discord and irc), and switching between a few models is something we would love to have.

Sep 26 '22 15:09 Melanpan

@Melanpan I've got a queueing system in-place in the new API =). This is great to hear that it's quick to switch - how quick is it in practice? I was considering allowing the user to provide the model name as part of the request, but only if the models could be swapped quickly and the common user could hold multiple models in memory.

Sep 26 '22 15:09 Kyle0654

@Kyle0654 I must admit I haven't looked at the new web interface, yet. It having it's own queuing system definitely sounds very useful, although our implantation is completely in-memory (Using Redis as backend) and also allow us to scale up to more than one machine that's running the backend so I'm not sure if we would want to switch since we also like to play with some other GFPGAN models after interfering. As well sending statistics to InfluxDB to generate some sexy usage graphs. https://i.imgur.com/T6AMUuq.png

Moving the model around is relatively fast, cpu to cuda takes ~1.4280633926391602 seconds and from cuda to cpu takes ~2.609034538269043 seconds. It probably would be good to keep track of which model is currently loaded in memory and don't "suspend" models right after generating, but only when needed.

Specs of my dev machine are:

2x Intel Xeon E5-2670 v0 (8 cores for the VM) 128GB DDR3 ECC (32GB for the VM) Nvidia Geforce 1050 TI

I'm curious to know how much of a increase in speed the 'suspending' and 'resuming' gets with a slightly faster CPU.

Sep 26 '22 15:09 Melanpan

My goal has been horizontal scalability with the API. Have been considering that different backends might want to pull from the queue based on their loaded model(s) as well ☺️. I'm keeping it a bit simpler initially, but with lots of room to replace components to achieve that goal.

Sep 26 '22 16:09 Kyle0654

Thank you for considering this. Keep in mind that one of the pillar of interest here, at least to me, is to keep dream.py main loop alive at all times. The MUST reload the script to load a model is definitely questionable... architectural refactoring is needed.

Sep 26 '22 20:09 Neosettler

Oh I completely agree, I didn't like having to rerun the script to switch the model, but wasn't sure how feasible it'd be to keep multiple models loaded at once, or swap them at runtime. I at least wanted to support unloading the current model and loading a new one, at a minimum. Sounds like keeping multiple loaded may be pretty feasible though.

Sep 26 '22 21:09 Kyle0654

Moving the model around is relatively fast, cpu to cuda takes ~1.4280633926391602 seconds and from cuda to cpu takes ~2.609034538269043 seconds.

Man, you seem to really hate the concept of rounding numbers.

Sep 26 '22 21:09 n00mkrad

I guess he also likes to say the numbers out loud! :)

Sep 26 '22 21:09 Neosettler

This is interesting and maybe doable (lstein is on board), but how do the timings on your "2x Intel Xeon E5-2670 v0 (8 cores for the VM)" translate to the average user's hardware (i.e. consumer grade i5/i7)?

Sep 27 '22 23:09 tildebyte

So I have implemented both model switching and model in RAM caching - see PR #948 On my 10 years old Xeon E3-1270 V2 (Ivy Bridge) ~ Intel Core i7-3770 and 32GB DDR3 RAM model switching takes 2.38 s. I hope on modern hardware it will be no slower.

Oct 06 '22 13:10 ArDiouscuros

I am in the process of merging pr #948 for fast model switching. I've incorporated their code into a model cache manager which keeps its eye on the amount of free CPU RAM to avoid out of memory errors. The feature will appear in the next point release.

The model switch feels instantaneous on my VT100 system.

Oct 12 '22 01:10 lstein

Is it possible to switch models by specifying a path?

Pretty annoying having to edit a YAML everytime I'm testing a dreambooth checkpoint.

Oct 12 '22 09:10 n00mkrad

Good job so far,

Pretty annoying having to edit a YAML everytime I'm testing a dreambooth checkpoint.

Drifting subject but still relevant within the "annoying" section. I'd definitely vote for re-animating the deprecated "weights" argument. Exorcising the need for updating a YAML nonsense. If a config file is an absolute must, consider using a json file instead.

Dec 16 '22 23:12 Neosettler