John issues

Results 41 issues of


                                            John

Performance - heads up

Just a heads up, given it's more than a week since last release. I'm deep in a complete overhaul of a series of behavior and functions. The core focus is...

enhancement

K Quant 64 support - quite a feat to integrate

A large patch was just integrated into llama.cpp (https://github.com/ggerganov/llama.cpp/pull/2001) another stunning job by @ikawrakow In the long run we need it, K quants are better for 7B and have more...

enhancement

Performance at high context (18k+)

Opening this as a ticket as this is quite a large thing to solve. We still suffer a significant slowdown compared to the fast speed for the first 1-2k context....

Steps forward - Tokenizer

I'm currently working on the tokenizer, we need a new one. The llama tokenizer is not suitable, it has problems forming larger tokens and favors smaller ones and it does...

Slowdown with tokens

With each token processed the inference speed slows down a little bit, starts to become noticeable at around 50 tokens on 40B Q3_K and adds up.

Bug: moondream2 inference not correct (severe quality degradation compared to reference)

### What happened? Moondream2 is a superb vision model, however on llama.cpp it performs at quality below vanilla llava-1 @vikhyat maybe you'd like to take a look ? I compared...

bug-unconfirmed

medium severity

Bug: Or Feature? BPE Tokenization mutates whitespaces into double-whitespace tokens when add_prefix_space is true (default)

### What happened? This is a bit discussed here already: https://github.com/ggerganov/llama.cpp/issues/7938 ` ` ``` 32001 -> '' 259 -> ' ' ``` Also `\n`: ``` 32001 -> '' 29871 ->...

bug-unconfirmed

stale

low severity

stale

John