llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Why is the latest version 2x slower?

Open Bloob-beep opened this issue 2 years ago • 6 comments

0.1.32 is 2x slower than 0.1.27

I tried using use_mlock=TRUE, warned me about RLIMIT and had to ulimit -l unlimited temporarily, but it still didn't improve. Is anyone else getting the same speed dip?

Bloob-beep avatar Apr 12 '23 03:04 Bloob-beep

I also experienced a significant slowdown that made performance go from "barely tolerable" to "unusable" on my system. It's particularly felt when initially loading the model (in oobabooga), which used to take single digit seconds and now takes minutes.

cmoncure avatar Apr 12 '23 12:04 cmoncure

yes very slow, seems the problem is on llama cpp's end, seen a few issues about speed recently.

nigh8w0lf avatar Apr 12 '23 21:04 nigh8w0lf

Can we install the older version again until this is fixed? How?

CyberTimon avatar Apr 12 '23 21:04 CyberTimon

@CyberTimon totally, you can check out the PyPI history here -> https://pypi.org/project/llama-cpp-python/#history and try out older versions.

You can just pip install llama-cpp-python==<version>

abetlen avatar Apr 12 '23 21:04 abetlen

I'm feeling this as well. Running llama-cpp manually is noticeably faster than running it through the API.

AndreiSva avatar Apr 13 '23 05:04 AndreiSva

Just checked it. LLaMA CPP is 40ms per token for me and the python bindings are 200ms per token so it's much slower. Sadly downgrading to version 0.1.27 is still slow.

CyberTimon avatar Apr 13 '23 09:04 CyberTimon

Lots of performance enhancements in the upstream llama.cpp and I am getting very good performance.

Can someone confirm that this performance regression is fixed?

gjmulder avatar May 15 '23 11:05 gjmulder