openai-function-tokens icon indicating copy to clipboard operation
openai-function-tokens copied to clipboard

Suggested functionality: Estimate by model_type

Open Somerandomguy10111 opened this issue 2 years ago • 1 comments

First off: Great tool and saved me the headache of trying to trace the functions tokens myself. A final touch could to introduce an option to have a token estimator class (tokenizer class?) which gets the model type as attribute and then uses the tiktoken.encoding_for_model() function to retrieve the encoding.

That way if openai ever changes the encoding or uses a different encoding for newer models the package can stay up to date. On a side note what I think is also useful are following functions which you can use e.g. to prevent logging of huge inputs to the model

def get_string_tokens(self, the_str : str) -> int:
    return len(self.encode(the_str))


def get_limited_string(self, the_str : str, max_tokens : int) -> str:
    encoded_str = self.encode(the_str)
    return self.decode(encoded_str[:max_tokens])

Best Somerandomguy10111

Somerandomguy10111 avatar Oct 19 '23 11:10 Somerandomguy10111

If I get around to it I will implement it and pull request it myself

Somerandomguy10111 avatar Oct 19 '23 11:10 Somerandomguy10111