llamafile
llamafile copied to clipboard
How to set context size? Running dolphin mixtral q4km, using too much of my 64gb of ram. want to lower it.
I am running Dolphin Mixtral Q4kM on windows, and I have 64 gb of ram. How can I set the Context length to reduce the amount of ram that is being used? I only need like maximum 2048 ctx length. Its eating up 57gb from my 64. How can I make it only use up 30GB?
thanks, would help if i got an answer
Use the following commandline option:
--ctx-size 2048
thanks
Thanks for helping @vlasky! You can also say -c 0
as an easy way to set the max context size allowed by the model.