langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Azure OpenAI Embedding langchain.embeddings.openai.embed_with_retry won't provide any embeddings after retries.

Open masoumi76 opened this issue 1 year ago • 16 comments

I have the following code:

docsearch = Chroma.from_documents(texts, embeddings,persist_directory=persist_directory)

and get the following error:

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Requests to the Embeddings_Create Operation under Azure OpenAI API version 2022-12-01 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please contact Azure support service if you would like to further increase the default rate limit.

The length of my texts list is less than 100 and as far as I know azure has a 400 request/min limit. That means I should not receive any limitation error. Can someone explain me what is happening which results to this error?

After these retires by Langchain, it looks like embeddings are lost and not stored in the Chroma DB. Could someone please give me a hint what I'm doing wrong?

using langchain==0.0.125

Many thanks

masoumi76 avatar Apr 06 '23 12:04 masoumi76

+1

xsser avatar Apr 13 '23 16:04 xsser

+1

Thystler avatar Apr 17 '23 15:04 Thystler

any suggestion would be very appreciated!

masoumi76 avatar Apr 18 '23 08:04 masoumi76

+1

Peter-Devine avatar Apr 25 '23 07:04 Peter-Devine

I set max_retries = 10. I am still getting "Retrying langchain.embeddings.openai.embed_with_retry" messages, but I was able to complete the index creation.

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1, max_retries=10)

nepomny avatar Apr 25 '23 13:04 nepomny

+10000

aiswaryasankar avatar Apr 25 '23 23:04 aiswaryasankar

Any solution to fix this issue ? +1

nitsahoo-hs avatar May 17 '23 18:05 nitsahoo-hs

So far as I know Azure OpenAI Embedding is different with Open AI official embedding api . it doesn't support us to use Chroma.from_documents instead we need to use Azure open ai embedding api to do it.

zxs731 avatar May 24 '23 01:05 zxs731

+1111

Levilian avatar May 24 '23 05:05 Levilian

I tried something like this: embeddings = OpenAIEmbeddings() vector_store = FAISS.from_texts(texts=["example1", "example2"], embedding=embeddings) and vector_store = Chroma.from_texts(texts=["example1", "example2"], embedding=embeddings)

got: Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details..

I'm passing a list that has a length of 2, and it is giving me RateLimitError.

Tried two versions of Langchain, 0.0.162 and 0.0.188, and both appeared with the same error.

EricLee911110 avatar Jun 01 '23 19:06 EricLee911110

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIError: The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 25115737c4fe3e6d4deef4961066ba2e in your email.) {
  "error": {
    "message": "The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 25115737c4fe3e6d4deef4961066ba2e in your email.)",
    "type": "server_error",
    "param": null,
    "code": null
  }
}
 500 {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID 25115737c4fe3e6d4deef4961066ba2e in your email.)', 'type': 'server_error', 'param': None, 'code': None}} {'Date': 'Fri, 16 Jun 2023 01:43:24 GMT', 'Content-Type': 'application/json', 'Content-Length': '366', 'Connection': 'keep-alive', 'access-control-allow-origin': '*', 'openai-organization': 'provectus-algae-pem6gx', 'openai-processing-ms': '5602', 'openai-version': '2020-10-01', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'x-ratelimit-limit-requests': '3000', 'x-ratelimit-remaining-requests': '2999', 'x-ratelimit-reset-requests': '20ms', 'x-request-id': '25115737c4fe3e6d4deef4961066ba2e', 'CF-Cache-Status': 'DYNAMIC', 'Server': 'cloudflare', 'CF-RAY': '7d7f5c6ebacfa83e-SYD', 'alt-svc': 'h3=":443"; ma=86400'}.

Killing me, I've sent through a single request (on a paid plan) and am being rate limited on embeddings.

tim-g-provectusalgae avatar Jun 16 '23 01:06 tim-g-provectusalgae

After a bit of digging i found this i've can suspect 2 causes:

  1. If you are using credits and they run out and you go on a pay-as-you-go plan with OpenAI, you may need to make a new API key
  2. Hitting rate limit of requests per minute. Found this notebook from OpenAI explaining ways to get around it. Haven't tested it yet but will report back if i make any headway How_to_handle_rate_limits.ipynb

Will try implement a fix on limiting the rate of request made per minute (as if the langchain community doesn't already have one somewhere)

AaronWard avatar Jun 27 '23 20:06 AaronWard

+1

meanirban100 avatar Jul 02 '23 14:07 meanirban100

getting for FAISS.from_documents(data, embeddings):

Traceback (most recent call last):
  File "/app/scheduler/4_generate_embeddings.py", line 52, in <module>
    vectors = FAISS.from_documents(data, embeddings)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/vectorstores/base.py", line 332, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 517, in from_texts
    embeddings = embedding.embed_documents(texts)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 452, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 302, in _get_len_safe_embeddings
    response = embed_with_retry(
               ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 97, in embed_with_retry
    return _embed_with_retry(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 95, in _embed_with_retry
    return embeddings.client.create(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/api_resources/embedding.py", line 33, in create
    response = super().create(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
                           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 230, in request
    resp, got_stream = self._interpret_response(result, stream)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 624, in _interpret_response
    self._interpret_response_line(
  File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 687, in _interpret_response_line
    raise self.handle_error_response(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/api_requestor.py", line 337, in handle_error_response
    raise error.APIError(
openai.error.APIError: Invalid response object from API: '{ "statusCode": 500, "message": "Internal server error", "activityId": "......." }' (HTTP response code was 500)

aiakubovich avatar Jul 03 '23 06:07 aiakubovich

Getting the same error using Azure OpenAI with openai.api_version = "2023-05-15"

Creating my embeddings:

from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(chunk_size=1, openai_api_version=openai.api_version, openai_api_key=openai.api_key, openai_api_type=openai.api_type,
 openai_api_base=openai.api_base, deployment="ChatGPTEmbeddings", model="text-embedding-ada-002")

Creating vector store index:

 index = VectorstoreIndexCreator(
    embedding = embeddings,
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

Receiving this error on loop, cell running for 1minute and 51 seconds. Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Requests to the Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms. Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 1 second. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit..

marielaquino avatar Jul 05 '23 21:07 marielaquino

+1000

JosefButts avatar Jul 07 '23 22:07 JosefButts

+1

theNullP0inter avatar Jul 29 '23 09:07 theNullP0inter

+1

OpenAI's developer experience is pretty frustrating.

niznet89 avatar Jul 30 '23 16:07 niznet89

There are two possible solutions:

  1. You can request an increase in quotas, which has been confirmed by MS support.
  2. Starting from July, Azure open AI support embedding with a chunk size of 16. You can find detailed usage information at the following reference: https://m.bilibili.com/video/BV1oP411r7g6 Unfortunately, the video at 1:30 only contains Chinese content.

zxs731 avatar Aug 02 '23 09:08 zxs731

I also meet this problem. But when I retry later, the bug is disappear. SOS

HowdyHuang avatar Aug 08 '23 09:08 HowdyHuang

I'm having this issue today, but not yesterday. 2023-08-08 14:56:18 INFO error_code=429 error_message='Requests to the Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms. Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.' error_param=None error_type=None message='OpenAI API error received' stream_error=False 2023-08-08 14:56:18 WARNING Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised RateLimitError: Requests to the Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms. Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 2 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit..

spierp-hd avatar Aug 09 '23 17:08 spierp-hd

I also meet this problem. But when I retry later, the bug is disappear. SOS

Can you please give me a source code which you used ?

parth-patel2023 avatar Aug 14 '23 07:08 parth-patel2023

I also meet this problem. But when I retry later, the bug is disappear. SOS

Can you please give me a source code which you used ?

My error report is slightly different from the title. I think it is a network problem caused by the langchain library. I cannot reproduce this problem with the previous code.

error like this: Retrying langchain.embeddings.openai.embed_with_retry.._embed_with_retry in 4.0 seconds as it raised APIError: Invalid response object from API: '{ "statusCode": 500, "message": "Internal server error", "activityId": "xxxx" }' (HTTP response code was 500).

This bug doesn't happen again and doesn't affect me. Thank you~

HowdyHuang avatar Aug 14 '23 07:08 HowdyHuang

OpenAI API limit is big problem. But OpenAI embeddings are not the best, so it can make sense just to use free one (see https://huggingface.co/spaces/mteb/leaderboard)

aiakubovich avatar Aug 14 '23 16:08 aiakubovich

Define following values in the code 👍:

openai.api_type = "azure" os.environ["OPENAI_API_TYPE"] = "azure" os.environ["OPENAI_API_KEY"] = "your api key" os.environ["OPENAI_API_BASE"] = "put yours" os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"

llm = AzureOpenAI( api_key="your api key", api_base="put yours", api_version="2023-03-15-preview", deployment_name="name of the deployment")

llm_embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size = 1)

this will definitely work with chroma,faisss db

meakshayraut avatar Aug 23 '23 10:08 meakshayraut

Someone solved the issue?

elorberb avatar Nov 16 '23 07:11 elorberb

@elorberb Define following values in the code 👍:

openai.api_type = "azure" os.environ["OPENAI_API_TYPE"] = "azure" os.environ["OPENAI_API_KEY"] = "your api key" os.environ["OPENAI_API_BASE"] = "put yours" os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"

llm = AzureOpenAI( api_key="your api key", api_base="put yours", api_version="2023-03-15-preview", deployment_name="name of the deployment")

llm_embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size = 1)

meakshayraut avatar Nov 16 '23 07:11 meakshayraut

In order to improve the embedding performance, you can set the chunk_size to 16, but first need to update the version to "2023-07-01-preview", os.environ["OPENAI_API_VERSION"] = "2023-07-01-preview"

and deployment name should not be forgotten,

embeddings = OpenAIEmbeddings(
        deployment="your deployment name",
        model="text-embedding-ada-002",
        chunk_size=16
)

zxs731 avatar Nov 17 '23 08:11 zxs731

What actually 16 does here?

meakshayraut avatar Nov 17 '23 08:11 meakshayraut

the number of tasks processed in parallel

zxs731 avatar Nov 17 '23 08:11 zxs731