llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Count/truncate number of tokens before processing

Open jakvb opened this issue 1 year ago • 7 comments

Create a function that takes in text as input, converts it into tokens, counts the tokens, and then returns the text with a maximum length that is limited by the token count. The returned text will be truncated if it exceeds the specified token count, ensuring that it does not exceed the maximum context size.

jakvb avatar Apr 22 '23 18:04 jakvb

@jakvb what's the use case for this?

Something like:

def truncate(llama: Llama, input: str, maxlen: int) -> str:
    return llama.detokenize(llama.tokenize(input)[:maxlen])

abetlen avatar Apr 26 '23 00:04 abetlen

@jakvb If you're using langchain as a wrapper around your LlamaCpp model, you can count the number of tokens before calling the llm with the following method get_num_tokens

    def get_num_tokens(self, text: str) -> int:
        tokenized_text = self.client.tokenize(text.encode("utf-8"))
        return len(tokenized_text)

Example usage

num_tokens: int = llama_model.get_num_tokens(question)

fpaupier avatar Dec 15 '23 07:12 fpaupier

And if we don't use LangChain?

Bl4ckh34d avatar Dec 26 '23 01:12 Bl4ckh34d

How about ollama ?

sohaibsoussi avatar Mar 25 '24 08:03 sohaibsoussi

And if we don't use LangChain?

# Where llm is an instance of the Llama class with a model loaded
tokens = llm.tokenize(b"Q: Name the planets in the solar system? A: ")
print(len(tokens))

Zetaphor avatar May 05 '24 22:05 Zetaphor

How would you truncate the final dictionary that includes the system etc?

madprops avatar May 15 '24 06:05 madprops

How would you truncate the final dictionary that includes the system etc?

I actually haven't tested if this includes the system prompt. One quick and easy way to test this and resolve it if it does would be to tokenize an empty string. If you get tokens back it includes the system prompt, if it doesn't then it doesn't.

Then you can just subtract the length of the system prompt dictionary from the start of the final dictionary.

My untested guess would be that using this function directly would not include the system prompt, as you pass that along with your call to the completion function. I would assume the completion function calls this internally with the system prompt prepended. I can't test any of this at the moment but it should be pretty easy to figure out.

Zetaphor avatar May 15 '24 17:05 Zetaphor