llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
failed to tokenize string! system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0...
This change modifies the `quantize.sh` script so that it can run properly on different platforms (including the Windows platform in the WSL environment).
bugfix: std::string mesh up vocab. OS: CentOS 7 compiler: gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
Hi everyone, I took a stab at adding embedding mode, where we print the sentence embedding for the input instead of generating more tokens. If I only add the compute...
This builds on my [other PR](https://github.com/ggerganov/llama.cpp/pull/267) to implement a very simple TCP mode. The new mode first loads the model then listens for TCP connections on a port. When a...
Add: https://github.com/gyunggyung/OpenMLLM Use: https://github.com/gyunggyung/KoAlpaca.cpp
Resolves https://github.com/ggerganov/llama.cpp/issues/240 WIP This needs to be able to: 1. Configure custom model folders. 2. Adjust settings for running variants of the Alpaca model and make corresponding changes in the...
This is a prototype of computing perplexity over the prompt input. It does so by using `n_ctx - 1` tokens as the input to the model, and computes the softmax...
After running the command: "python3 convert-pth-to-ggml.py /Users/tanish.shah/llama.cpp/models/7B/ 1" Error with sentencepiece: ``` Traceback (most recent call last): File "/Users/tanish.shah/llama.cpp/convert-pth-to-ggml.py", line 75, in tokenizer = sentencepiece.SentencePieceProcessor(fname_tokenizer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/tanish.shah/llama.cpp/env/lib/python3.11/site-packages/sentencepiece/__init__.py", line 447,...