llama_index [Bug]: `llama_index` retries `openai.AuthenticationError`

Bug Description

llama_index retries openai.AuthenticationError

Version

0.10.19

Steps to Reproduce

make a request via llama_index to Open AI, just use a random string as a key. llama_index logs and the delay clearly demonstrates that the requests are retried. Even adding max_retries=1 to OpenAI object does not solve the problem.

Relevant Logs/Tracbacks

/Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py(40)query()
-> return self._query(str_or_query_bundle)
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py(187)_query()
-> response = self._response_synthesizer.synthesize(
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/response_synthesizers/base.py(195)synthesize()
-> response_str = self.get_response(
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py(38)get_response()
-> return super().get_response(
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/response_synthesizers/refine.py(163)get_response()
-> response = self._give_response_single(
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/response_synthesizers/refine.py(217)_give_response_single()
-> program(
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/response_synthesizers/refine.py(68)__call__()
-> answer = self._llm.predict(
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/llms/llm.py(253)predict()
-> chat_response = self.chat(messages)
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/core/llms/callbacks.py(93)wrapped_llm_chat()
-> f_return_val = f(_self, messages, **kwargs)
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/llms/openai/base.py(257)chat()
-> return chat_fn(messages, **kwargs)
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/tenacity/__init__.py(289)wrapped_f()
-> return self(f, *args, **kw)
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/tenacity/__init__.py(382)__call__()
-> result = fn(*args, **kwargs)
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/llama_index/llms/openai/base.py(323)_chat()
-> response = client.chat.completions.create(
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/openai/_utils/_utils.py(275)wrapper()
-> return func(*args, **kwargs)
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/openai/resources/chat/completions.py(663)create()
-> return self._post(
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/openai/_base_client.py(1201)post()
-> return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/openai/_base_client.py(889)request()
-> return self._request(
> /Users/ashishb/Library/Caches/pypoetry/virtualenvs/src-ynB8Yjnz-py3.10/lib/python3.10/site-packages/openai/_base_client.py(964)_request()
-> if retries > 0 and self._should_retry(err.response):

Mar 16 '24 01:03 ashishb

Hey @ashishb! Great to see you around here again. Hope you're doing well!

Thank you for reporting this issue. The behavior you're observing where llama_index retries openai.AuthenticationError even when max_retries=1 is set in the OpenAI object is likely due to the use of a custom retry decorator (llm_retry_decorator) applied to the asynchronous chat methods (_achat and _astream_chat) in the provided code. This decorator is defined with a max_retries parameter set to 6, overriding the max_retries attribute of the OpenAI class.

Here's the relevant part of the code that defines the llm_retry_decorator:

llm_retry_decorator = create_retry_decorator(
    max_retries=6,
    random_exponential=True,
    stop_after_delay_seconds=60,
    min_seconds=1,
    max_seconds=20,
)

And here's how it's applied to the asynchronous chat methods:

@llm_retry_decorator
async def _achat(
    self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponse:
    ...

@llm_retry_decorator
async def _astream_chat(
    self, messages: Sequence[ChatMessage], **kwargs: Any
) -> ChatResponseAsyncGen:
    ...

The llm_retry_decorator is applied directly to these methods, and it uses its own max_retries value (6 in this case), which does not take into account the max_retries attribute set in the OpenAI object's constructor or any other method. This is why, despite setting max_retries=1 in the OpenAI object, the retry behavior observed is that of the decorator's configuration, leading to up to 6 retries in case of an openai.AuthenticationError.

As for the changes between version 0.10.19 and the latest version 0.10.20, there were several new features and bug fixes introduced, but none of them seem to directly address this specific issue. You can find more details about these changes in the CHANGELOG.md.

The purpose of the llama_index retrying openai.AuthenticationError is to handle transient errors that can occur when making API requests to OpenAI. This retry mechanism is crucial for maintaining the reliability and stability of the LlamaIndex's functionality, especially in scenarios where network issues or temporary OpenAI service disruptions might cause authentication errors or other types of failures.

If you need to override the max_retries value for a specific use case, you can do so by calling create_retry_decorator with a different max_retries value and applying the returned decorator to the desired function. However, please note that this would require changes to the LlamaIndex codebase and should be done with caution to avoid unintended side effects.

The specific choice of 6 retries is a balance that the developers of this code have chosen, likely based on their understanding of the reliability of the services being called and the importance of not giving up on a request too early. However, the exact reasoning behind choosing 6 as the value can vary and might be based on empirical data, industry standards, or specific requirements of the application or the API being called.

I hope this helps clarify the issue. If you have further questions or need more assistance, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Mar 16 '24 01:03 dosubot[bot]

What's the issue here? I'm not entirely sure how to solve this either.

The openai client manages retries for some basic things.

Beyond that, tenacity is providing retries for specific error types (authentication errors aren't in here, at least from what I can tell). Maybe one of these error types is a parent to authentication errors? https://github.com/run-llama/llama_index/blob/cf5f75a228ecd8e8cd423f0ada61eca2d97fe96f/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py#L162

If you have an idea for a PR, go for it

Mar 16 '24 01:03 logan-markewich

Beyond that, tenacity is providing retries for specific error types (authentication errors aren't in here, at least from what I can tell). Maybe one of these error types is a parent to authentication errors?

I tested manually, it is not the specific list of errors but the llm_retry_decorator that's the problem.

Mar 16 '24 08:03 ashishb

@logan-markewich can we just remove this global decorator completely? Or at least add an option to do so?

May 18 '24 06:05 ashishb

Not entirely familiar with tenacity, but I see that AuthenticationError is subclass of APIStatusError (openai sdk) which is an error that will trigger a retry

May 20 '24 14:05 LMartinezEXEX

hi bro ,I have a bug . Could you help me solve it? 1it [00:00, ?it/s] 0%| | 0/1 [00:00<?, ?it/s]Retrying llama_index.llms.openai.base.OpenAI._achat in 0.8624432450322062 seconds as it raised APIConnectionError: Connection error.. Retrying llama_index.llms.openai.base.OpenAI._achat in 0.3647944153096849 seconds as it raised APIConnectionError: Connection error.. Traceback (most recent call last):

Aug 16 '24 03:08 oneyyh

llama_index llama_index copied to clipboard

[Bug]: `llama_index` retries `openai.AuthenticationError`

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

Sources

About Dosu

llama_index
llama_index copied to clipboard