azure-search-openai-demo OpenAI Chat Responses Limited

OpenAI Chat Responses Limited

Open jrichMQ opened this issue 8 months ago • 1 comments

We see cases where lengthy responses back from OpenAI cut off at a certain token limit. Our current workaround is to ask the LLM to "continue from where it left off", any plans/ways to address this and allow full response without this workaround?

Mar 04 '25 20:03 jrichMQ

We currently have a token limit of 1024:

    response_token_limit = 1024
    messages = build_messages(
        model=self.chatgpt_model,
        system_prompt=rendered_answer_prompt.system_content,
        past_messages=rendered_answer_prompt.past_messages,
        new_user_content=rendered_answer_prompt.new_user_content,
        max_tokens=self.chatgpt_token_limit - response_token_limit,
        fallback_to_default=self.ALLOW_NON_GPT_MODELS,
    )

You can increase the token limit there to your desired maximum, if you need longer responses. It just means that the message history truncation logic may need to cut earlier messages in the conversation, to allow enough room for the response.

Mar 06 '25 00:03 pamelafox

azure-search-openai-demo azure-search-openai-demo copied to clipboard

OpenAI Chat Responses Limited

azure-search-openai-demo
azure-search-openai-demo copied to clipboard