openlimit icon indicating copy to clipboard operation
openlimit copied to clipboard

Token limit

Open szym1998 opened this issue 1 year ago • 2 comments

Issue Description

Problem: When using the new OpenAI library in my asynchronous application, I've encountered an issue related to rate limiting.

Description: It appears that when I run my asynchronous application, even just once, the rate limiter starts to restrict requests, preventing it from going through. This issue arises when I set the token limit to 90,000. However, when I increase the token limit to 900,000, the request go through without. It's important to note that my system message, user input, and response typically comprise only around 2,700 tokens in total.

Steps to Reproduce:

Install the OpenAI library 1.1 i think (the latest one) Set the token limit to 90,000. Run your asynchronous application. Observe the rate limiter restricting requests. Expected Behavior:

Requests should not be rate-limited when the token limit is set to 90,000, given that the total token count is well below this limit. Actual Behavior:

The rate limiter appears to limit requests, even when the token limit is set to 90,000.

szym1998 avatar Nov 10 '23 23:11 szym1998

async def create_poem(): chat_params = { "model": "gpt-3.5-turbo-1106", "temperature": 0.01, "response_format": {"type": "json_object"}, "max_tokens": 2048, "messages": [ {"role": "system", "content": combined_system_message}, {"role": "user", "content": data_content} ] } try: async with rate_limiter.limit(**chat_params): completion = await client.chat.completions.create(**chat_params)

        # Extract the content from the first choice of the completion
        content = completion.choices[0].message.content
        #load content as json
        content = json.loads(content)
    return content
except Exception as e:
    print(f"An error occurred: {e}")
    return None

szym1998 avatar Nov 10 '23 23:11 szym1998

I think the behavior you are seeing is because the single request token max limit is strict 1 / 60 of your token limit (90000 / 60 = 1500 in your case). Which is why the capacity is never enough to fulfill the request and it hangs forever. This PR would solve the issue https://github.com/shobrook/openlimit/pull/10

klintan avatar Nov 17 '23 02:11 klintan