llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
The goal of this refactor is allow reusing the model execution while using streams other than stdin/stdout for interaction. In my case, I'd like to implement a simple TCP server...
The following is a proposed template for creating new issues. If people think the tone could be improved, I'd appreciate feedback! ___ # Prerequisites Please answer the following questions for...
I was tinkering with the code and made the following change in `line 977, main.cpp` (as it seemed wrong to me): *from* ```C if (embd.size() > params.n_batch) { break; }...
I am trying to output just the sentence embedding for a given input, instead of any new generated text. I think this should be rather straightforward but figured someone more...
This appear to solve https://github.com/ggerganov/llama.cpp/issues/153 where error of `ggml_new_tensor_impl: not enough space in the context's memory pool` is thrown in interactive mode, if using a larger context size. At least...
I do not expect this to be merged, but I figured it might help others. Although, I don't know if this is the right place. This logs information to a...
### Discussed in https://github.com/ggerganov/llama.cpp/discussions/234 Originally posted by **ShouNichi** March 17, 2023 When `git checkout 84d9015` and `make`, there will be no output (only the model loading message) in termux. `git...
In the PR that was resolved (#132), the action defined to publish the packages used the user and token of the author of the commit in master. In this case,...
It would be great to start doing this kind of quantitative analysis of `ggml`-based inference: https://bellard.org/ts_server/ It looks like Fabrice evaluates the models using something called LM Evaluation Harness: https://github.com/EleutherAI/lm-evaluation-harness...