llama.cpp Reduce model loading time

Reduce model loading time

Open maekawatoshiki opened this issue 1 year ago • 0 comments

Hello!

I noticed that the model loader is not using buffered IO, so I added a piece of code for buffering. I measured the loading time only for llama 7B on my M1 Pro Macbook, but it reduced the time from 1316ms to 749ms.

Mar 12 '23 10:03 maekawatoshiki

llama.cpp llama.cpp copied to clipboard

Reduce model loading time

llama.cpp
llama.cpp copied to clipboard