llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
llama code block include view_as_real: https://github.com/facebookresearch/llama/blob/main/llama/model.py#L68 how to convert-pth-to-ggml.py handle this part of weight
Edit: Most of the below is now outdated. This PR aims to do two things. -Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode -Better...
In `convert-pth-to-ggml.py`, `dir_model` is something like `models/7B` or `models/7B/`. `tokenizer.model` is expected under model's parent dir. When `dir_model` is a symlink, `f"{dir_model}/../tokenizer.model"` would not be found. Let's use the model's...
Just several minor cleanup. 1. Mac (Intel) related: * `$(UNAME_M)` shows "x86-64". * `shell sysctl -n hw.optional.arm64` outputs an error that should be ignored. * Add additional comment on `-framework...
As per https://github.com/ggerganov/llama.cpp/blob/da5303c1ea68aa19db829c634f1e10d08d409680/main.cpp#L1066 the EOS flag in interactive mode simply causes `is_interacting` to switch on, and so it serves as a way to end the current series of tokens and...
I have seen that support has been added in the master branch to the alpaca model, I have included the model in the docker scripts. Now you can try like...
Would be cool to be able to lean on the neural engine. Even if it wasn't much faster, it'd still be more energy efficient I believe.
129c7d1e (#20) added a repetition penalty that prevent the model to run into loops. Here are a few suggestions for possible enhancements: * One issue with the interactive mode is...
Hello, I've tried out the Aplaca model but after a while there comes an error I believe stating: "zsh: segmentation fault ./main -m ./models/alpaca/ggml-alpaca-7b-q4.bin --color -f -ins". Thanks. Code: ./main...
After the PR #252, all base models need to be converted new. For me, this is a big breaking change. The LoRa and/or Alpaca fine-tuned models are not compatible anymore....