Philpax
Philpax
Your best bet is to build your own server based around `llm`; `llm-cli` is basically just a demo application for `llm`. You can see how this might be done in...
It's been merged: https://github.com/ggerganov/llama.cpp/pull/1405 There doesn't seem to be a migration path at present, so let's wait a bit: https://github.com/ggerganov/llama.cpp/issues/1408
This is done in #226, but I'd like to set up a migration path before I close this
No migration path for now. See #261
We believe so, but it hasn't been tested. Additionally, the `build.rs` for `ggml-sys` wouldn't build with the right flags, but that should be easy enough to fix. If you'd like...
Is this done now?
How different is this to the original GPT-J implementation? Can the codegen model be implemented by calling into GPT-J with a parameter to use a slightly different computation graph? I'd...
I don't have enough RAM to test, but I'd suggest looking at performance numbers for `llama.cpp` - we should be about on par (barring any improvements that we haven't kept...
How weird... is that q4 or f16?
Ok, just tested with https://huggingface.co/xzuyn/GPT-2-124M-ggml-q4_1/blob/main/ggml-model-q4_1.bin on macOS: ``` # cargo run --bin llm gpt2 infer -m models/gpt2/GPT-2-124M-ggml-q4_1.bin -p "1 + 2 = " Finished dev [unoptimized + debuginfo] target(s) in...