text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Limit CPU RAM usage when offloading model to GPU

Open 0xbitches opened this issue 1 year ago • 5 comments

Description Would like to request a feature to limit python's RAM usage when loading a model to GPU. Right now, loading llama-30b-int4 would use up all of my 32GB of system RAM, and would occasionally crash my programs. I tried --cpu-memory flag but it appears to do nothing for me.

Additional Context

If applicable, please provide any extra information, external links, or screenshots that could be useful.

0xbitches avatar Mar 14 '23 09:03 0xbitches

A workaround is to increase your swap size.

oobabooga avatar Mar 14 '23 09:03 oobabooga

increase your swap size.

Tried on windows to change virtual memory, but it does not appear to help. Maybe I will have to settle for using third party programs to limit ram

0xbitches avatar Mar 14 '23 10:03 0xbitches

Did you try the --cpu-memory flag?

RazeLighter777 avatar Mar 14 '23 11:03 RazeLighter777

@RazeLighter777 I mentioned in the first comment that it's the first thing I tried. Sadly it does not work (at least on windows it seems).

0xbitches avatar Mar 14 '23 16:03 0xbitches

Same here, except I can't load the 30b model in CPU mode at all (Which I did with the lama.cpp)

slashedstar avatar Mar 15 '23 00:03 slashedstar

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

github-actions[bot] avatar Apr 14 '23 23:04 github-actions[bot]