llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Reduce model loading time

Open maekawatoshiki opened this issue 1 year ago • 0 comments

Hello!

I noticed that the model loader is not using buffered IO, so I added a piece of code for buffering. I measured the loading time only for llama 7B on my M1 Pro Macbook, but it reduced the time from 1316ms to 749ms.

maekawatoshiki avatar Mar 12 '23 10:03 maekawatoshiki