serge icon indicating copy to clipboard operation
serge copied to clipboard

Option for pre-loading specific models into memory

Open toasterrepairman opened this issue 2 years ago • 7 comments

Not sure if this feature is possible, but I'd like the ability to specify (preferably in my .env file) models to leave pre-loaded in memory. It shouldn't be the default choice, but it would allow bandwidth-constrained servers to run faster, as well as reducing overall latency when running as an API.

Thanks for making this, and I look forward to seeing your plans for the API refactor! :smiley:

toasterrepairman avatar Mar 22 '23 23:03 toasterrepairman

Here are some thoughts :

But maybe a simpler way would be to declare a path as a tmpfs in the docker-compose file, and have the api code copy the files into that tmpfs location at startup.

Another thought : using https://github.com/hyperonym/basaran

I'm curious what other ideas people will come with on that matter

thomasleveil avatar Mar 23 '23 07:03 thomasleveil

Hi, on my side I already tested by mounting in tmpfs the /var/lib/docker directory and the repository directory (manually via the unix system), and it's hardly faster if you already have an nvme, in any case I didn't notice much difference.

maxime-dlabai avatar Mar 23 '23 11:03 maxime-dlabai

On the other hand I wonder if increasing the priority of the processes at startup when it generates a discussion would be interesting ex: "chrt -f 90 llama". And above all, wouldn't it be more efficient to run the process directly outside of docker.

maxime-dlabai avatar Mar 23 '23 11:03 maxime-dlabai

Just clone and build branch with mmap allocation, and yes is faster than main branch, is like instant when process is running. https://github.com/ggerganov/llama.cpp/tree/mmap I try some modification for allocation on memory in main.cpp.

maxime-dlabai avatar Mar 23 '23 12:03 maxime-dlabai

Here is main version :

image

maxime-dlabai avatar Mar 23 '23 12:03 maxime-dlabai

And version custom with mmap allocation (i added some code to enable back avx512 on this version).

image

maxime-dlabai avatar Mar 23 '23 12:03 maxime-dlabai

Sorry here is last main version previous screenshot is an older main version than 2 days before i think.

image

maxime-dlabai avatar Mar 23 '23 12:03 maxime-dlabai

Closed via #129

gaby avatar Apr 05 '23 03:04 gaby