llama.cpp issues

Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs

9

Idea from: https://github.com/ggerganov/llama.cpp/issues/23#issuecomment-1465308592 We can add a `--cache_prompt` flag that if added will dump the computed KV caches of the prompt processing to the disk in a file with name...

ggerganov

enhancement

help wanted

good first issue

high priority

🦙.

error building on linux

16

I cloned the GitHub repository and ran the make command but was unable to get the cpp files to compile successfully. Any help or suggestion would be appreciated. Terminal output:...

tofasthacker

Don't use vdotq_s32 if it's not available

`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in 84d9015 if...

Ronsor

convert the 7B model to ggml FP16 format fails on RPi 4B

8

Everything's OK until this step python3 convert-pth-to-ggml.py models/7B/ 1 {'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': 32000} n_parts = 1 Processing part 0 Killed models/7B/ggml-model-f16.bin isn't...

davidrutland

Fix potential licensing issue

4

I'm not an expert on Licenses BUT, If you attribute Facebook in the README and description, you essentially admit/imply that this repo is a modification of their repo. Facebook's repo...

musabgultekin

🚀 Dockerize llamacpp

18

First of all, thank you for the effort of the entire community. The work they do is impressive. I'm going to try to do my bit by dockerizing this client...

bernatvadell

Proposal: Retire make; Update build instructions for Cmake

2

Now we have a shiny new cmake frontend, can we: - eliminate the makefile? - document the Cmake build instructions? As far as I know, users might use the make...

maxsu

enhancement

FP16 and 4-bit quantized model both produce garbage output on M1 8GB

1

Both the `ggml-model-q4_0` and `ggml-model-f16` produce a garbage output on my M1 Air 8GB, using the 7B LLaMA model. I've seen the quantized model having problems but I doubt the...

Alden5

Adding llama banner in README.md

4

Add a banner with a C++ llama logo in the `README.md` ![banner](https://user-images.githubusercontent.com/4641499/225103864-afc1483a-677d-440a-b71e-9c5842c12268.png) Preview here: [https://github.com/leszekhanusz/llama.cpp/tree/readme_llama_banner](https://github.com/leszekhanusz/llama.cpp/tree/readme_llama_banner) Current discussion in issue #105 The text can be changed if needed, suggestions welcome. The...

leszekhanusz

unexpected shut down when number of tokens is large

2

I found that the model of LLaMA-7B shut down unexpectedly when the number of tokens in prompt reaches some value, this value is approximately to be 500 this cannot be...

HeMuling

llama.cpp
llama.cpp copied to clipboard

Metadata

Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs

error building on linux

Don't use vdotq_s32 if it's not available

convert the 7B model to ggml FP16 format fails on RPi 4B

Fix potential licensing issue

🚀 Dockerize llamacpp

Proposal: Retire make; Update build instructions for Cmake

FP16 and 4-bit quantized model both produce garbage output on M1 8GB

Adding llama banner in README.md

unexpected shut down when number of tokens is large

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard