llama.cpp
llama.cpp copied to clipboard
Reduce model loading time
Hello!
I noticed that the model loader is not using buffered IO, so I added a piece of code for buffering. I measured the loading time only for llama 7B on my M1 Pro Macbook, but it reduced the time from 1316ms to 749ms.