text-generation-webui Limit CPU RAM usage when offloading model to GPU

Limit CPU RAM usage when offloading model to GPU

Open 0xbitches opened this issue 1 year ago • 5 comments

Description Would like to request a feature to limit python's RAM usage when loading a model to GPU. Right now, loading llama-30b-int4 would use up all of my 32GB of system RAM, and would occasionally crash my programs. I tried --cpu-memory flag but it appears to do nothing for me.

Additional Context

If applicable, please provide any extra information, external links, or screenshots that could be useful.

Mar 14 '23 09:03 0xbitches

A workaround is to increase your swap size.

Mar 14 '23 09:03 oobabooga

increase your swap size.

Tried on windows to change virtual memory, but it does not appear to help. Maybe I will have to settle for using third party programs to limit ram

Mar 14 '23 10:03 0xbitches

Did you try the --cpu-memory flag?

Mar 14 '23 11:03 RazeLighter777

@RazeLighter777 I mentioned in the first comment that it's the first thing I tried. Sadly it does not work (at least on windows it seems).

Mar 14 '23 16:03 0xbitches

Same here, except I can't load the 30b model in CPU mode at all (Which I did with the lama.cpp)

Mar 15 '23 00:03 slashedstar

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

Apr 14 '23 23:04 github-actions[bot]

text-generation-webui text-generation-webui copied to clipboard

Limit CPU RAM usage when offloading model to GPU

text-generation-webui
text-generation-webui copied to clipboard