llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

LLM inference in C/C++

Results 1628 llama.cpp issues
Sort by recently updated
recently updated
newest added

I've been testing your code from 1 to 8 threads and the output is always different. The speed is not depend on the number of threads. On the contrary, 4...

In this case the llama.cpp and the llama tokenizers produce different output: ``` main: prompt: 'This is 🦙.cpp' main: number of tokens in prompt = 10 1 -> '' 4013...

bug

It's really annoying that I have to restart the program every time it quits by **[end of text]** or exceeding context limits, as I need to reload model, which is...

hi team, was playing interactive mode for couple hours, pretty impressive resides what's mentioned in #145 , it might be not too far, to plug this a endpoint / functional...

I really thank you for the possibility of running the model on my MacBook Air M1. I've been testing various parameters and I'm happy even with the 7B model. However,...

As suggested in #146 we are able to save lots of memory by using float16 instead of float32. I implemented the suggested changes, and tested with the 7B and 13B...

No clue but I think it may work faster

enhancement
performance
hardware

I have no clue about this, but I saw that chatglm-6b was published, which should run on CPU with 16GB ram, albeit very slow. [https://huggingface.co/THUDM/chatglm-6b/tree/main](url) Would it be possible to...

enhancement
model

Hi, What models i really need? I have these: The only 7B folder for example is necessary? Each model has different results? I don't understand if i need only one...

need more info
model