llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

How to set context size? Running dolphin mixtral q4km, using too much of my 64gb of ram. want to lower it.

Open FemBoxbrawl opened this issue 9 months ago • 1 comments

I am running Dolphin Mixtral Q4kM on windows, and I have 64 gb of ram. How can I set the Context length to reduce the amount of ram that is being used? I only need like maximum 2048 ctx length. Its eating up 57gb from my 64. How can I make it only use up 30GB?

thanks, would help if i got an answer

FemBoxbrawl avatar May 16 '24 18:05 FemBoxbrawl

Use the following commandline option:

--ctx-size 2048

vlasky avatar May 17 '24 06:05 vlasky

thanks

FemBoxbrawl avatar May 17 '24 19:05 FemBoxbrawl

Thanks for helping @vlasky! You can also say -c 0 as an easy way to set the max context size allowed by the model.

jart avatar May 17 '24 20:05 jart