text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

[Request] Support for llama.cpp

Open gorbypark opened this issue 1 year ago • 5 comments

I'd love to see support for llama.cpp. I am currently running the 13B model (4 bit) on a M2 MacBook Air with 24GB of ram at about 270ms per token, which all things considered is pretty good.

gorbypark avatar Mar 11 '23 20:03 gorbypark

llama.cpp is an interesting development - supports Mac M1/M2 and x86 AVX2 instructions (ie: it's pretty quick for a CPU implementation). I'm able to load the 65B-4bit and get around 850ms per token. That said, it looks like there's a fair bit of coordination and glue needed to get these talking. llama.cpp isn't really setup as a library, nor does it offer an API. Not that those are any sort of heavy lift, but rather they might be the sort of request that needs to be implemented on their site first before it can be leveraged by others.

jtang613 avatar Mar 11 '23 23:03 jtang613

according to my tests, llama.cpp with 4bit quantizing is much faster than gpu+ram offload in terms of speed, at least for the 7B model. However, since this is a cpp program which requires compiling, it might be hard to implement this. Some probable approach may include making a dynamic library for every OS or patch the code so it acts as a "backend" of webui (which also requires compiling the code before hand), but either way, that would be a lot of work for ooba...

Silver267 avatar Mar 12 '23 03:03 Silver267

ram usage (ryzen 3700 system): 7B: 4529.34 MB 30B: 20951.50 MB 65B: 41477.73 MB

@Silver267 there's a library version one guy is doing https://github.com/j-f1/forked-llama.cpp/tree/swift posted about it here: https://github.com/ggerganov/llama.cpp/issues/23#issuecomment-1465017679

G2G2G2G avatar Mar 12 '23 06:03 G2G2G2G

Looks like there's a draft PR for this: https://github.com/oobabooga/text-generation-webui/pull/447

Loufe avatar Mar 20 '23 18:03 Loufe

https://github.com/PotatoSpudowski/fastLLaMa might be relevant

TheTerrasque avatar Mar 22 '23 09:03 TheTerrasque

Discussion moved to https://github.com/oobabooga/text-generation-webui/issues/575

oobabooga avatar Mar 29 '23 02:03 oobabooga