llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1637 llama.cpp issues
Sort by recently updated
recently updated
newest added

This commit adds two new files, Windows-installer.bat and Windows-model_conversion.bat, both of which serve to make using llama.cpp on Windows easier. Windows-installer.bat installs dependencies, such as Python, and Windows-model_conversion.bat converts the...

enhancement
build

```python #!/usr/bin/env python3 import os import sys if not (len(sys.argv) == 2 and sys.argv[1] in ["7B", "13B", "30B", "65B"]): print(f"\nUsage: {sys.argv[0]} 7B|13B|30B|65B [--remove-f16]\n") sys.exit(1) for i in os.listdir(f"models/{sys.argv[1]}"): if i.endswith("ggml-model-f16.bin"):...

enhancement

Added install instructions for the versions of `torch` and `sentencepiece` missing from the pip repo on the latest python3 - Used to get this working on Python 3.11.0

Drop torch, do not load whole file into memory, process files in parallel and use separate threads for r/w

enhancement
performance

Adds the --ignore-eos switch which prevents generation of the end of text (eos) token. This can be useful to avoid unexpected terminations in interactive mode and to force the model...

enhancement

I improved the quantize script by adding error handling and allowing to select many models for quantization at once in the command line. I also converted it to Python for...

enhancement

Tried to address slow weights loading. 7B is okay, but 13B is really slow (several minutes), hard to experiment/prototype with larger models. Replaced `std::ifstream` with C-style file reading using `fopen`....

enhancement
performance

I believe this largely fixes the tokenization issues. The example mentioned in https://github.com/ggerganov/llama.cpp/issues/167 as well as my local tests (e.g. "accurately" should tokenize as `[7913, 2486]`) are fixed by it....

https://github.com/ggerganov/llama.cpp/blob/721311070e31464ac12bef9a4444093eb3eaebf7/main.cpp#L980-L983 This can fail to colorize the last `params.n_batch` part of the prompt correctly because `embd` was just loaded with those tokens and not printed, yet.

bug

So. I'm trying to build with CMake on Windows 11 and the thing just stops after it's done loading the model. ![image](https://user-images.githubusercontent.com/4723091/226091364-64a488a7-ebb5-4c24-9dd0-1cb81378008d.png) And apparently, this is a segfault. ![Screenshot_20230318_121935](https://user-images.githubusercontent.com/4723091/226091335-afbf2712-d2b8-4b88-9b44-6b6a43d78565.png) Yay...

bug
duplicate
hardware
model