llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
Thanks.
Bug encountered when running `python3 convert-pth-to-ggml.py models/7B/ 1`: ``` llama.cpp % python3 convert-pth-to-ggml.py models/7B/ 1 Traceback (most recent call last): File "/Users/jjyuhub/llama.cpp/convert-pth-to-ggml.py", line 69, in hparams = json.load(f) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/json/__init__.py",...
When I build, the makefile detects my M1 Max as 86_64. This is because I have GNU coreutils `uname` on my `PATH`, which announces my architecture as `arm64` (whereas the...
I propose refactoring `main.cpp` into a library (`llama.cpp`, compiled to `llama.so`/`llama.a`/whatever) and making `main.cpp` a simple driver program. A simple C API should be exposed to access the model, and...
Apologies if Github Issues is not the right place for this question, but do you know if anyone has hosted the ggml versions of the models? The disk space required...
Hi, Im getting a strange behaviour and answer: ``` ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 256 --repeat_penalty 1.0 --color -p "User: how many wheels have a car?" main: seed =...
By deleting line 155 (#include ) in ggml.c, it works just fine on RISC-V. Maybe this can be added in Cmake?
Without "static" quantifier, it fails to compile in clang ``` ld.lld: error: undefined symbol: packNibbles >>> referenced by ggml.c:520 (llama_cpp/ggml.c:520) >>> .../llama_cpp/__ggml__/__objects__/ggml.c.pic.o:(quantize_row_q4_0) ld.lld: error: undefined symbol: bytesFromNibbles >>> referenced by...