llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

How to solve limit rate during `load_data` ?

Open Slach opened this issue 2 years ago • 9 comments

How to resolve following error?

openai.error.RateLimitError: Rate limit reached for default-global-with-image-limits in organization org-xxx
on requests per min. Limit: 60.000000 / min. Current: 120.000000 / min. Contact [email protected] if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://beta.openai.com/account/billing to add a payment method.

I use following code for indexer

from gpt_index import GPTSimpleVectorIndex
from gpt_index.readers.database import DatabaseReader

documents = DatabaseReader(
    uri='clickhouse+native://localhost:9000/default',
).load_data(
    query="SELECT concat( arrayStringConcat( "
          "  arrayMap("
          "     x -> dictGet('default.clickhouse_telegram_dict_hierarchical','text', x), "
          "     dictGetHierarchy('default.clickhouse_telegram_dict_hierarchical', id)"
          "  ), '\n'"
          "), '\n', text) AS t "
          "FROM default.clickhouse_telegram_data WHERE type='message' AND reply_to_message_id!=0"
)
index = GPTSimpleVectorIndex(documents)
index.save_to_disk('data/gpt-index.json')

Slach avatar Jan 29 '23 20:01 Slach

@Slach is this error still occurring? we have some exponential backoff mechanisms but the simplest here may just be to retry since ratelimiterrors are sporadic

jerryjliu avatar Jan 29 '23 22:01 jerryjliu

yes, still constantly occurring how to turn on and control this expotential backoff with rate limit?

Currently I use gpt-index 0.2.17

How to turn on some debug logs for load_data?

Slach avatar Jan 30 '23 04:01 Slach

Same here. Run the same example notebook on Google Colab (see here) and it pops up RateLimitError: You exceeded your current quota, please check your plan and billing details.

Raychanan avatar Jan 30 '23 22:01 Raychanan

in 0.3.4 error happens just little bit later

openai.error.RateLimitError: Rate limit reached for default-global-with-image-limits 
in organization org-p8rdKQOhFbDRNO96potZ218G on requests per min. 
Limit: 60.000000 / min. Current: 90.000000 / min. 
Contact [email protected] if you continue to have issues. 
Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing 
to add a payment method.

@jerryjliu how to resolve this issue?

Slach avatar Feb 03 '23 08:02 Slach

full stack trace, how to setup properly limit rates for GPTSimpleVectorIndex ?

Traceback (most recent call last):
  File "/mnt/d/src/github.com/Slach/clickhouse-gpt/indexer.py", line 15, in <module>
    index = GPTSimpleVectorIndex(documents)
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/indices/vector_store/simple.py", line 48, in __init__
    super().__init__(
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/indices/vector_store/base.py", line 43, in __init__
    super().__init__(
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/indices/base.py", line 97, in __init__
    self._index_struct = self.build_index_from_documents(
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/token_counter/token_counter.py", line 54, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/indices/base.py", line 215, in build_index_from_documents
    return self._build_index_from_documents(documents, verbose=verbose)
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/indices/vector_store/base.py", line 74, in _build_index_from_documents
    self._add_document_to_index(index_struct, d, text_splitter)
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/indices/vector_store/simple.py", line 70, in _add_document_to_index
    text_embedding = self._embed_model.get_text_embedding(n.get_text())
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/embeddings/base.py", line 49, in get_text_embedding
    text_embedding = self._get_text_embedding(text)
  File "/home/slach/.local/lib/python3.10/site-packages/gpt_index/embeddings/openai.py", line 148, in _get_text_embedding
    return get_embedding(text, engine=engine)
  File "/home/slach/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 326, in wrapped_f
    return self(f, *args, **kw)
  File "/home/slach/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 406, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/slach/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 363, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7fc118604460 state=finished raised RateLimitError>]

Slach avatar Feb 03 '23 10:02 Slach

looks like

@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def get_embedding(
...

in gpt_index/embeddings/openai.py

doesn't work as expected ;(

Slach avatar Feb 03 '23 10:02 Slach

I think the minimum wait time should be at least 60 seconds to account for the free tier OpenAI users?

Mikkolehtimaki avatar Feb 05 '23 16:02 Mikkolehtimaki

no, i think stop_affr_attempts(100) will enough retry policy means retry when request to openai failed

Slach avatar Feb 05 '23 16:02 Slach

Hi @jerryjliu I've opened a pull request to solve this rate limit error. I've increased the wait time range as well as max retries and it worked for me.

(@Slach feel free to comment on the PR as well)

ajndkr avatar Feb 09 '23 16:02 ajndkr

I am still facing the same error with the latest version. I have a paid plan

sid-metricpath avatar Jul 24 '23 17:07 sid-metricpath

We're also facing this issue @jerryjliu by simply cloning the main repo

gorkamolero avatar Jul 25 '23 10:07 gorkamolero

Also facing this error with a vanilla setup

from llama_index import VectorStore, SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data([
   ...
   URLs
   ...
])

index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist("some file path")

I'll note that a friend of mine who is using this same exact code did not run into these issues. Neither of us are using OpenAI API keys.

comalice avatar Aug 24 '23 21:08 comalice