llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

[Question]: RateLimit Embedding

Open mirallm opened this issue 9 months ago • 1 comments

Question Validation

  • [x] I have searched both the documentation and discord for an answer.

Question

Hello I am getting this warning that I would like to manage it. @dosubot

WARNING:llama_index.embeddings.openai.utils:Retrying llama_index.embeddings.openai.base.OpenAIEmbedding._aget_text_embeddings.._retryable_aget_embeddings in 0.4735038183779753 seconds as it raised RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}.

Based on my code, how can I manage it:

         :param nodes: List of TextNodes to index.
                            :param embed_model: The model used for embedding the documents.
                            """
                            from llama_index.core.data_structs.data_structs import IndexDict
                            try:
                          
                                if self.vector_index is None:
                                    
                                    
                                        # Create a new index if not already created
                                        self.vector_index = VectorStoreIndex(
                                            nodes=nodes,
                                            use_async=False,
                                        
                                            embed_model=embed_model,
                                            storage_context=self.storage_context, 
                                            show_progress=True
                                        )
                                        print(self.vector_index.index_struct)
                                        print("Created new index.")
                                else:
                                        # If index exists, load it and insert the nodes
                                        print("Loading existing index and inserting nodes.")
                                        await self.load_index(embed_model)
                                    # index_struct = IndexDict()
                                        index_struct=self.vector_index._build_index_from_nodes(nodes)
                                      #  self.vector_index.insert_nodes(nodes)
                                       # print(index_struct)
                                        await self.vector_index._async_add_nodes_to_index(index_struct=index_struct, nodes=nodes, show_progress=True, min_seconds=12, max_seconds=120) #
                            except Exception as e:
                               print(e)
                               print(f"An error occurred while embedding the file: {e}")`
        
        @dosubot

mirallm avatar Feb 27 '25 08:02 mirallm

To manage the RateLimitError when using the OpenAIEmbedding with the Azure OpenAI API, you can implement a retry mechanism using the tenacity library. This approach allows you to handle rate limit errors by retrying the request with an exponential backoff strategy. Here's a code snippet that demonstrates how to set up this retry mechanism:

from tenacity import retry, wait_random_exponential, stop_after_attempt

# Define your retry strategy
retry_decorator = retry(
    wait=wait_random_exponential(min=4, max=10),  # Random exponential backoff between 4 and 10 seconds
    stop=stop_after_attempt(5)  # Maximum number of retries
)

# Apply the retry decorator to your function that makes the API call
@retry_decorator
async def call_openai_embedding():
    # Your code to call the OpenAI API
    pass

# Use the function
await call_openai_embedding()

In your existing code, you can apply this retry strategy to the part where you are calling the OpenAI API for embeddings. This will help manage the rate limit by automatically retrying the request if a RateLimitError occurs, with a delay that increases exponentially between retries [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Feb 27 '25 08:02 dosubot[bot]