text-generation-webui CLI flag to offload weights to system RAM when not in use

CLI flag to offload weights to system RAM when not in use

Open catboxanon opened this issue 1 year ago • 2 comments

Description

This feature request is to add a CLI flag to offload/cache weights to system RAM when the software is in an "idle" state. This also means in this idle state, VRAM should theoretically be completely empty. This of course would add more latency when you do want to run inference, but an example use case of what this allows for is a convenience of being able to do something like running a LLM and LDM sequentially without OOM issues.

Additional Context

This functionality differs from --auto-devices because it would offload weights entirely to system RAM in an idle state. This would be similar to what the Stable Diffusion web UI offers via the --medvram flag. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Optimizations

only one [component] is in VRAM at all times, sending others to CPU RAM

I did open a discussion for this, but lack of activity has lead me to open an issue so this could be tracked.

May 08 '23 22:05 catboxanon

In the sd_api_pictures extension you can run both SD_webui && Ooba at once. There's a checkbox to manage vram. Takes seconds to switch models located on an NVME.

https://github.com/oobabooga/text-generation-webui/blob/main/docs/Extensions.md

May 09 '23 04:05 cornpo

Interesting, thanks for sharing that. It's not 100% what I'm looking for, as I'd rather unload and reload weights of the text gen web UI from the SD web UI, not the other way around like that extension, but I can just add a few endpoints for calling unload_model and reload_model and that'll do it. I'll leave this issue open for a bit longer but will likely close it if no further discussion is brought up.

May 09 '23 14:05 catboxanon

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

Jun 08 '23 23:06 github-actions[bot]

text-generation-webui text-generation-webui copied to clipboard

CLI flag to offload weights to system RAM when not in use

text-generation-webui
text-generation-webui copied to clipboard