imaginAIry icon indicating copy to clipboard operation
imaginAIry copied to clipboard

Better documentation of how to not Load model onto cuda for every generation

Open xzitlou opened this issue 2 years ago • 13 comments

Every time I try to use the "imagine 'something...'" command it's loading the model huggingface into cuda, is there anything to keep the model loaded onto cuda so the process run faster? I want to run it in workers but I don't know if there is any way to load the model in cuda just once so the workers don't have to do load it on every request

xzitlou avatar Jan 26 '23 04:01 xzitlou

it should only be doing that if you're switching models or on the first image. Please provide logs

brycedrennan avatar Jan 26 '23 04:01 brycedrennan

I'll reopen when we get more info.

brycedrennan avatar Jan 26 '23 06:01 brycedrennan

This looks like a question I also had, so I'll reply here instead of creating a new issue.

I believe what xzitlou was asking is if there is a way to keep the model loaded for more than one batch of prompts/tasks.

Personally I would be interested in a way to load the model and inference from it separately. Example:

Load the model into VRAM/RAM with one command. Call a command that uses that model, and just perform inference on it. Run another command with that model, but different prompts/sizes.

This would prevent the need to load the model every time you run a new task, and would open up a lot of uses.

Astropulse avatar Jan 30 '23 15:01 Astropulse

I see I misunderstood. My bad. This is already supported so we should figure out how to improve the docs. Yeah super annoying to have to reload the model every time.

brycedrennan avatar Jan 30 '23 16:01 brycedrennan

Try running ‘aimg’

brycedrennan avatar Jan 30 '23 16:01 brycedrennan

Ah! Brilliant, thank you.

I'm specifically wondering if it could be used across multiple terminals, I assume not at the moment.

Astropulse avatar Jan 30 '23 16:01 Astropulse

It could not work across multiple terminals as is. I welcome any ideas you have about what architecture would support that.

brycedrennan avatar Feb 03 '23 22:02 brycedrennan

I'm not super experienced in terms of memory management. I figure since the model is loaded into vram, calling inference on it from another process should just be a matter of knowing were it is. I'm not sure if this is possible with how pytorch actually handles loading and unloading models though.

Astropulse avatar Feb 03 '23 23:02 Astropulse

I agree it's theoretically possible. I doubt it's easily possible. Python isn't great at this sort of thing.

brycedrennan avatar Feb 04 '23 01:02 brycedrennan

I'll keep looking into it and see if I find anything. I have a very strange set of restrictions for my stable diffusion environment.

On another note, is there any way I can donate to the development of this tool? It's by far the best cli implementation I've come across.

Astropulse avatar Feb 04 '23 03:02 Astropulse

That's very generous of you. I can set something up to facilitate donations. Do you have a preference or recommendation?

Help me understand the environment, because even if you had multiple shells open you wouldn't be able to generate more than one image at a time, due to VRAM limitations, so then what would be the point?

brycedrennan avatar Feb 04 '23 04:02 brycedrennan

In my experience buymecoffee is quite nice for handling donations.

My environment is based on using lua to call cli commands, it can either create a terminal and ignore it (making it unable to retrieve generated images or send new commands to the process) or open a terminal and wait for it to close. I've been looking for a way to start one terminal as a 'model loader' and background it, then open other terminals to generate images and retrieve the results. If there's a simpler way to handle that I'd love to hear ideas, I'm sure there's a solution I'm just unaware of.

Astropulse avatar Feb 04 '23 04:02 Astropulse

Interesting I'm gonna have to think about that. I haven't used lua before. The first solution that comes to mind is a fastapi server with an internal queue. You submit generation requests via http.

Set this up just now https://www.buymeacoffee.com/brycedrennan

brycedrennan avatar Feb 04 '23 15:02 brycedrennan

Ah, as expected it was a simple solution that just didn't occur to me. Thank you.

Astropulse avatar Feb 05 '23 05:02 Astropulse

I've added a message at the top of every command that is not run via the aimg shell advertising the shell.

brycedrennan avatar Feb 05 '23 17:02 brycedrennan

also @Astropulse thanks for the generous donation!

brycedrennan avatar Feb 05 '23 18:02 brycedrennan