llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

llama : rename n_ctx to kv_size

Open ggerganov opened this issue 1 year ago • 3 comments

The n_ctx name is causing some confusion since it's actual meaning is the size of the KV cache, while n_ctx_train is the training context of the model

This change fixes that, but since it is a big one and touches a lot of stuff, I'm not sure if it worth merging. Maybe sometime in the future, when the time is right

Original PR: https://github.com/ggerganov/llama.cpp/pull/5546

ggerganov avatar Feb 18 '24 20:02 ggerganov

Does n_ctx in whisper.cpp also refer to the size of the KV cache?

bobqianic avatar Feb 19 '24 02:02 bobqianic

In the decoder - yes

ggerganov avatar Feb 19 '24 07:02 ggerganov

I do not agree with this change (but I like the underlying intention of making llama.cpp less confusing).

As I'm working on supporting Mamba in llama.cpp (see #5328), I'd like to warn that renaming n_ctx to kv_size would make it harder to support non-Transformer architectures in a straightforward way.

With Mamba, the KV cache size is tied to the maximum number of distinct sequences processed at the same time. Not the "context size". n_ctx is still used to limit the maximum number of processed tokens, which is fine, because some examples need a fixed size for the buffer of input tokens (e.g. in server, lookahead, lookup, parallel, and perplexity when calling llama_batch_init).

What I propose instead (and this is what I've started doing in #5328) is to keep n_ctx, but use kv_self.size instead of n_ctx in the places where it's really the KV cache size that is meant, because Mamba breaks the equivalence of n_ctx with kv_self.size.

TL;DR: renaming n_ctx to kv_size makes it harder to decouple the context size from the KV cache size.

compilade avatar Feb 21 '24 23:02 compilade