imaginAIry
imaginAIry copied to clipboard
Better documentation of how to not Load model onto cuda for every generation
Every time I try to use the "imagine 'something...'" command it's loading the model huggingface into cuda, is there anything to keep the model loaded onto cuda so the process run faster? I want to run it in workers but I don't know if there is any way to load the model in cuda just once so the workers don't have to do load it on every request
it should only be doing that if you're switching models or on the first image. Please provide logs
I'll reopen when we get more info.
This looks like a question I also had, so I'll reply here instead of creating a new issue.
I believe what xzitlou was asking is if there is a way to keep the model loaded for more than one batch of prompts/tasks.
Personally I would be interested in a way to load the model and inference from it separately. Example:
Load the model into VRAM/RAM with one command. Call a command that uses that model, and just perform inference on it. Run another command with that model, but different prompts/sizes.
This would prevent the need to load the model every time you run a new task, and would open up a lot of uses.
I see I misunderstood. My bad. This is already supported so we should figure out how to improve the docs. Yeah super annoying to have to reload the model every time.
Try running ‘aimg’
Ah! Brilliant, thank you.
I'm specifically wondering if it could be used across multiple terminals, I assume not at the moment.
It could not work across multiple terminals as is. I welcome any ideas you have about what architecture would support that.
I'm not super experienced in terms of memory management. I figure since the model is loaded into vram, calling inference on it from another process should just be a matter of knowing were it is. I'm not sure if this is possible with how pytorch actually handles loading and unloading models though.
I agree it's theoretically possible. I doubt it's easily possible. Python isn't great at this sort of thing.
I'll keep looking into it and see if I find anything. I have a very strange set of restrictions for my stable diffusion environment.
On another note, is there any way I can donate to the development of this tool? It's by far the best cli implementation I've come across.
That's very generous of you. I can set something up to facilitate donations. Do you have a preference or recommendation?
Help me understand the environment, because even if you had multiple shells open you wouldn't be able to generate more than one image at a time, due to VRAM limitations, so then what would be the point?
In my experience buymecoffee is quite nice for handling donations.
My environment is based on using lua to call cli commands, it can either create a terminal and ignore it (making it unable to retrieve generated images or send new commands to the process) or open a terminal and wait for it to close. I've been looking for a way to start one terminal as a 'model loader' and background it, then open other terminals to generate images and retrieve the results. If there's a simpler way to handle that I'd love to hear ideas, I'm sure there's a solution I'm just unaware of.
Interesting I'm gonna have to think about that. I haven't used lua before. The first solution that comes to mind is a fastapi server with an internal queue. You submit generation requests via http.
Set this up just now https://www.buymeacoffee.com/brycedrennan
Ah, as expected it was a simple solution that just didn't occur to me. Thank you.
I've added a message at the top of every command that is not run via the aimg
shell advertising the shell.
also @Astropulse thanks for the generous donation!