imaginAIry Better documentation of how to not Load model onto cuda for every generation

Every time I try to use the "imagine 'something...'" command it's loading the model huggingface into cuda, is there anything to keep the model loaded onto cuda so the process run faster? I want to run it in workers but I don't know if there is any way to load the model in cuda just once so the workers don't have to do load it on every request

Jan 26 '23 04:01 xzitlou

it should only be doing that if you're switching models or on the first image. Please provide logs

Jan 26 '23 04:01 brycedrennan

I'll reopen when we get more info.

Jan 26 '23 06:01 brycedrennan

This looks like a question I also had, so I'll reply here instead of creating a new issue.

I believe what xzitlou was asking is if there is a way to keep the model loaded for more than one batch of prompts/tasks.

Personally I would be interested in a way to load the model and inference from it separately. Example:

Load the model into VRAM/RAM with one command. Call a command that uses that model, and just perform inference on it. Run another command with that model, but different prompts/sizes.

This would prevent the need to load the model every time you run a new task, and would open up a lot of uses.

Jan 30 '23 15:01 Astropulse

I see I misunderstood. My bad. This is already supported so we should figure out how to improve the docs. Yeah super annoying to have to reload the model every time.

Jan 30 '23 16:01 brycedrennan

Try running ‘aimg’

Jan 30 '23 16:01 brycedrennan

Ah! Brilliant, thank you.

I'm specifically wondering if it could be used across multiple terminals, I assume not at the moment.

Jan 30 '23 16:01 Astropulse

It could not work across multiple terminals as is. I welcome any ideas you have about what architecture would support that.

Feb 03 '23 22:02 brycedrennan

I'm not super experienced in terms of memory management. I figure since the model is loaded into vram, calling inference on it from another process should just be a matter of knowing were it is. I'm not sure if this is possible with how pytorch actually handles loading and unloading models though.

Feb 03 '23 23:02 Astropulse

I agree it's theoretically possible. I doubt it's easily possible. Python isn't great at this sort of thing.

Feb 04 '23 01:02 brycedrennan

I'll keep looking into it and see if I find anything. I have a very strange set of restrictions for my stable diffusion environment.

On another note, is there any way I can donate to the development of this tool? It's by far the best cli implementation I've come across.

Feb 04 '23 03:02 Astropulse

That's very generous of you. I can set something up to facilitate donations. Do you have a preference or recommendation?

Help me understand the environment, because even if you had multiple shells open you wouldn't be able to generate more than one image at a time, due to VRAM limitations, so then what would be the point?

Feb 04 '23 04:02 brycedrennan

In my experience buymecoffee is quite nice for handling donations.

My environment is based on using lua to call cli commands, it can either create a terminal and ignore it (making it unable to retrieve generated images or send new commands to the process) or open a terminal and wait for it to close. I've been looking for a way to start one terminal as a 'model loader' and background it, then open other terminals to generate images and retrieve the results. If there's a simpler way to handle that I'd love to hear ideas, I'm sure there's a solution I'm just unaware of.

Feb 04 '23 04:02 Astropulse

Interesting I'm gonna have to think about that. I haven't used lua before. The first solution that comes to mind is a fastapi server with an internal queue. You submit generation requests via http.

Set this up just now https://www.buymeacoffee.com/brycedrennan

Feb 04 '23 15:02 brycedrennan

Ah, as expected it was a simple solution that just didn't occur to me. Thank you.

Feb 05 '23 05:02 Astropulse

I've added a message at the top of every command that is not run via the aimg shell advertising the shell.

Feb 05 '23 17:02 brycedrennan

also @Astropulse thanks for the generous donation!

Feb 05 '23 18:02 brycedrennan

imaginAIry imaginAIry copied to clipboard

Better documentation of how to not Load model onto cuda for every generation

imaginAIry
imaginAIry copied to clipboard