developer icon indicating copy to clipboard operation
developer copied to clipboard

Support full token limits

Open IanCal opened this issue 2 years ago • 6 comments

Nice project. I saw a note here:

https://github.com/smol-ai/developer/blob/03dc5d6d28d38fabf2276ab9a002b273529998b8/main.py#L10

You can get the total tokens for a request, then subtract it from the max tokens the model allows. Here's the cookbook implementation: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

Without so many models to support you can cut it down, here's an example I grabbed from a codebase I've got for generating full applications from prompts too:

def num_tokens_from_messages(messages):
    """Returns the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model("gpt-4")
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")
    num_tokens = 0
    for message in messages:
        num_tokens += (
            4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
        )
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":  # if there's a name, the role is omitted
                num_tokens += -1  # role is always required and always 1 token
    num_tokens += 2  # every reply is primed with <im_start>assistant
    return num_tokens

IanCal avatar May 15 '23 12:05 IanCal

thank you! but quick followup - is it actually always a good thing to request the max token length? will it incentivize the model to hallucinate code it doesnt need? smol is better, no?

in other words, if max token length is strictly always better, why would openai want us to reduce it down, ever?

swyxio avatar May 15 '23 16:05 swyxio

Based on this discussion on the OpenAI community site, max token changes based on the model you're using and if you exceed the max token for a particular model, it will return a 400.

Not sure why they're having the API user set it, but there you go.

djstunami avatar May 15 '23 18:05 djstunami

yea i'm pretty sure choice of max token affects the output. shorter makes it try to end sooner. this is feature not a bug

swyxio avatar May 15 '23 19:05 swyxio

Interesting - it's not an issue I've really hit since generally I'm trying to get a lot of output (more stuff per prompt = more bang for your buck). It could help explain why I'm getting less useful outputs from some things though, it doesn't have enough space to create the right hierarchies.

The question I guess is whether it will output needlessly large results. It definitely doesn't fill the full amount each time, broadly I've found gpt-4 to be quite good at doing just what it needs to.

in other words, if max token length is strictly always better, why would openai want us to reduce it down, ever?

TBF it's important when it comes to cost & more, particularly as the context size increases. 8k is a lot, if you really just need a short response it's good to be able to cap it.

IanCal avatar May 16 '23 11:05 IanCal

8k is a lot, if you really just need a short response it's good to be able to cap it.

yes but notice that the shortness of the requested tokens actually does somewhat affect the generated output - openai is doing something other than next token prediction, that shortens the answer when its about to go over (this is a weak effect, ive observed this rule been broken too, but ive definitely noticed it)

swyxio avatar May 17 '23 06:05 swyxio

yea i'm pretty sure choice of max token affects the output. shorter makes it try to end sooner. this is feature not a bug

I found this openai community thread reply:

I asked the support and they clarified that GPT-3 will not attempt to create shorter texts with a smaller max_tokens value. The text will indeed just be cut off. So in my case, I guess it makes sense to use a higher value to have more “wiggle room”.

On the other hand I can imagine that OpenAI might do some funky stuff in the background, e.g. prefixes your prompt with "Answer in {max_tokens*0.75} or less words." This would not do exactly that as it's been shown that the model is not THAT capable but it might influence the length.

csabag avatar May 18 '23 18:05 csabag