llama.cpp
llama.cpp copied to clipboard
LLM inference in C/C++
Firstly, thank you for the awesome project. I'm new to LLMs so I hope this suggestion makes sense. LoRA is a technique used to reduce the number of parameters during...
> Not all of these checksums seem to be correct. Are they calculated with the "v2" new model format after the tokenizer change? PR: https://github.com/ggerganov/llama.cpp/pull/252 Issue: https://github.com/ggerganov/llama.cpp/issues/324 > > For...
So I am looking at https://github.com/antimatter15/alpaca.cpp and I see they are already running 30B Alpaca models, while we are struggling to run 7B due to the recent tokenizer updates. I...
We might want to add a Nix CI job to ensure it doesn't get desynced. @prusnak thoughts?
Otherwise the tests may be ran and pass even if the build has errors and thus the step itself can pass with errors in build
Hey, I noticed the API is running on CPP, were the original weights in python or CPP? If in python, I would think they were in pytorch since that is...
When I execute this command: make -j && ./main -m ./models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512 An error was reported: llama_init_from_file: failed...
Hello, I noticed something when trying the chat with Bob is that I always get the first token as empty. 1 -> '' 4103 -> ' Trans' 924 -> 'cript'...
Hey! There should be a simple example on how to use the new C API (like one that simply takes a hardcoded string and runs llama on it until \n...
Previously, `python quantize.py --models-path .. 7B 13B` would fail to find `../7B/ggml-model-f16.bin` Now, it computes the absolute path to the models and uses that instead which works.