Philpax comments

Results 495 comments of


                                            Philpax

how should i build a web interface?

Your best bet is to build your own server based around `llm`; `llm-cli` is basically just a demo application for `llm`. You can see how this might be done in...

Support the bit-shuffling changes from `llama.cpp`

It's been merged: https://github.com/ggerganov/llama.cpp/pull/1405 There doesn't seem to be a migration path at present, so let's wait a bit: https://github.com/ggerganov/llama.cpp/issues/1408

Support the bit-shuffling changes from `llama.cpp`

This is done in #226, but I'd like to set up a migration path before I close this

Support the bit-shuffling changes from `llama.cpp`

No migration path for now. See #261

Can this library be compiled to run on WebAssembly and within a Wasm container?

We believe so, but it hasn't been tested. Additionally, the `build.rs` for `ggml-sys` wouldn't build with the right flags, but that should be easy enough to fix. If you'd like...

Codegen Implementation

How different is this to the original GPT-J implementation? Can the codegen model be implemented by calling into GPT-J with a parameter to use a slightly different computation graph? I'd...

Performance of inference with 65B model on high-end CPU?

I don't have enough RAM to test, but I'd suggest looking at performance numbers for `llama.cpp` - we should be about on par (barring any improvements that we haven't kept...

GPT-2 segfaults when used through the CLI

How weird... is that q4 or f16?

GPT-2 segfaults when used through the CLI

Ok, just tested with https://huggingface.co/xzuyn/GPT-2-124M-ggml-q4_1/blob/main/ggml-model-q4_1.bin on macOS: ``` # cargo run --bin llm gpt2 infer -m models/gpt2/GPT-2-124M-ggml-q4_1.bin -p "1 + 2 = " Finished dev [unoptimized + debuginfo] target(s) in...