llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
This commit adds two new files, Windows-installer.bat and Windows-model_conversion.bat, both of which serve to make using llama.cpp on Windows easier. Windows-installer.bat installs dependencies, such as Python, and Windows-model_conversion.bat converts the...
```python #!/usr/bin/env python3 import os import sys if not (len(sys.argv) == 2 and sys.argv[1] in ["7B", "13B", "30B", "65B"]): print(f"\nUsage: {sys.argv[0]} 7B|13B|30B|65B [--remove-f16]\n") sys.exit(1) for i in os.listdir(f"models/{sys.argv[1]}"): if i.endswith("ggml-model-f16.bin"):...
Added install instructions for the versions of `torch` and `sentencepiece` missing from the pip repo on the latest python3 - Used to get this working on Python 3.11.0
Drop torch, do not load whole file into memory, process files in parallel and use separate threads for r/w
Adds the --ignore-eos switch which prevents generation of the end of text (eos) token. This can be useful to avoid unexpected terminations in interactive mode and to force the model...
I improved the quantize script by adding error handling and allowing to select many models for quantization at once in the command line. I also converted it to Python for...
Tried to address slow weights loading. 7B is okay, but 13B is really slow (several minutes), hard to experiment/prototype with larger models. Replaced `std::ifstream` with C-style file reading using `fopen`....
I believe this largely fixes the tokenization issues. The example mentioned in https://github.com/ggerganov/llama.cpp/issues/167 as well as my local tests (e.g. "accurately" should tokenize as `[7913, 2486]`) are fixed by it....
https://github.com/ggerganov/llama.cpp/blob/721311070e31464ac12bef9a4444093eb3eaebf7/main.cpp#L980-L983 This can fail to colorize the last `params.n_batch` part of the prompt correctly because `embd` was just loaded with those tokens and not printed, yet.
So. I'm trying to build with CMake on Windows 11 and the thing just stops after it's done loading the model.  And apparently, this is a segfault.  Yay...