llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Faster loading of the model

Open kig opened this issue 1 year ago • 5 comments

I was playing with the 65B model, and it took a minute to read the files. If you wrap the model loader loop with a #pragma omp parallel for and add -fopenmp to the compiler flags, you can drop it to 18 seconds.

kig avatar Mar 13 '23 08:03 kig

Great idea. We prefer to not use -fopenmp. The implementation should use #include <thread>

ggerganov avatar Mar 13 '23 08:03 ggerganov

and TBB? https://github.com/oneapi-src/oneTBB - lic: Apache

I remember that the mold linker project also uses it.

kassane avatar Mar 15 '23 17:03 kassane

Not familiar with TBB, but most likely the answer is no

ggerganov avatar Mar 15 '23 20:03 ggerganov

I have some experiments with optimizing large file read I/O in https://gist.github.com/kig/357a4193be54915d142f1db6063bc929 and https://github.com/kig/fast_read_optimizer if you want to overkill it...

kig avatar Mar 16 '23 01:03 kig

Has this been implemented yet?

maxtriano avatar Jun 07 '23 14:06 maxtriano