text-generation-webui
text-generation-webui copied to clipboard
Limit CPU RAM usage when offloading model to GPU
Description Would like to request a feature to limit python's RAM usage when loading a model to GPU. Right now, loading llama-30b-int4 would use up all of my 32GB of system RAM, and would occasionally crash my programs. I tried --cpu-memory flag but it appears to do nothing for me.
Additional Context
If applicable, please provide any extra information, external links, or screenshots that could be useful.
A workaround is to increase your swap size.
increase your swap size.
Tried on windows to change virtual memory, but it does not appear to help. Maybe I will have to settle for using third party programs to limit ram
Did you try the --cpu-memory
flag?
@RazeLighter777 I mentioned in the first comment that it's the first thing I tried. Sadly it does not work (at least on windows it seems).
Same here, except I can't load the 30b model in CPU mode at all (Which I did with the lama.cpp)
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.