llama.cpp issues

Not an issue but what depends on the number of threads?

2

I've been testing your code from 1 to 8 threads and the output is always different. The speed is not depend on the number of threads. On the contrary, 4...

alexcardo

Differences with the llama tokenizer

18

In this case the llama.cpp and the llama tokenizers produce different output: ``` main: prompt: 'This is 🦙.cpp' main: number of tokens in prompt = 10 1 -> '' 4013...

slaren

bug

CMake: properly find and link to threading library

arrowd

Reset context instead of quitting in interactive mode

3

It's really annoying that I have to restart the program every time it quits by **[end of text]** or exceeding context limits, as I need to reload model, which is...

avada-z

feature request, restful api / exposure

2

hi team, was playing interactive mode for couple hours, pretty impressive resides what's mentioned in #145 , it might be not too far, to plug this a endpoint / functional...

aratic

Will there ever be a GPU support for Apple Silicon?

I really thank you for the possibility of running the model on my MacBook Air M1. I've been testing various parameters and I'm happy even with the 7B model. However,...

alexcardo

Use F16 for memory_k and memory_v (as suggested in #146)

4

As suggested in #146 we are able to save lots of memory by using float16 instead of float32. I implemented the suggested changes, and tested with the 7B and 13B...

ty-everett

Add avx-512 support?

6

No clue but I think it may work faster

FNsi

enhancement

performance

hardware

is it possible to use llama,cpp with other neural networks?

2

I have no clue about this, but I saw that chatglm-6b was published, which should run on CPU with 16GB ram, albeit very slow. [https://huggingface.co/THUDM/chatglm-6b/tree/main](url) Would it be possible to...

dbpaul

enhancement

model

What models i really need?

3

Hi, What models i really need? I have these: The only 7B folder for example is necessary? Each model has different results? I don't understand if i need only one...

paulocoutinhox

need more info

model

llama.cpp
llama.cpp copied to clipboard

Metadata

Not an issue but what depends on the number of threads?

Differences with the llama tokenizer

CMake: properly find and link to threading library

Reset context instead of quitting in interactive mode

feature request, restful api / exposure

Will there ever be a GPU support for Apple Silicon?

Use F16 for memory_k and memory_v (as suggested in #146)

Add avx-512 support?

is it possible to use llama,cpp with other neural networks?

What models i really need?

← Metadata

Owner

Metadata

llama.cpp llama.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

llama.cpp
llama.cpp copied to clipboard