llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1628 llama.cpp issues
Sort by recently updated
recently updated
newest added

Please include the `ggml-model-q4_0.bin` model to actually run the code: ``` % make -j && ./main -m ./models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t...

bug
model

Hi there, I downloaded my LLaMa weights through bit-torrent, and tried to convert the 7B model to ggml FP16 format: ``` $python convert-pth-to-ggml.py models/7B/ 1 normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty....

* Ran into this error on a Macbook Pro M1 ``` ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 [1] 18452 illegal hardware instruction ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 ``` * What I've tried: *...

Hello! I noticed that the model loader is not using buffered IO, so I added a piece of code for buffering. I measured the loading time only for llama 7B...

Hi, First of all, thanks for the tremendous work! I just wanted to ask that compared to your demo, when I run the same input sentence, the speed difference is...

need more info

This pull request adds a simple [Nix Flake](https://nixos.wiki/wiki/Flakes) for building and distributing the binaries of this repository in a combined package. The `main` binary can be executed like this (assuming...

- Combined nmake/Unix Makefile. - _alloca instead of variable size array. - Do not do math on void*, could cast to char*, but in this case, move the uint8_t* cast....

Heya! Friend showed this to me and I'm trying to get it to work myself on Windows 10. I've applied the changes as seen in #22 to get it to...

Would love to see a faster, more memory efficient attention implemented like Flash Attention. :)

enhancement