langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Using AzureOpenAIEmbeddings throws input string is not valid when trying to embed a string

Open Govindarajan-D opened this issue 1 year ago • 3 comments

Checked other resources

  • [X] I added a very descriptive title to this issue.
  • [X] I searched the LangChain documentation with the integrated search.
  • [X] I used the GitHub search to find a similar question and didn't find it.
  • [X] I am sure that this is a bug in LangChain rather than my code.
  • [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I have imported the langchain library for embeddings from langchain_openai.embeddings import AzureOpenAIEmbeddings

And then built the embedding model like below:

embedding_model = AzureOpenAIEmbeddings(
    azure_endpoint= AOAI_ENDPOINT,
    openai_api_key = AOAI_KEY
) 

When I try to run a simple _token, it succeeds print(embedding_model._tokenize(["Test","Message"],2048))

But if I try to embed a query, it throws an error saying 'Input should be a valid string' print(embedding_model.embed_query("Test Message"))

Error Message and Stack Trace (if applicable)

Traceback (most recent call last): File "c:\Users\govindarajand\backend-llm-model\stock_model\embed-test.py", line 55, in print(embedding_model.embed_query("Test Message")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain_openai\embeddings\base.py", line 530, in e mbed_query return self.embed_documents([text])[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain_openai\embeddings\base.py", line 489, in e mbed_documents return self._get_len_safe_embeddings(texts, engine=engine) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain_openai\embeddings\base.py", line 347, in _ get_len_safe_embeddings response = self.client.create( ^^^^^^^^^^^^^^^^^^^ File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai\resources\embeddings.py", line 114, in create

return self._post(
       ^^^^^^^^^^^

File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1240, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 921, in request return self._request( ^^^^^^^^^^^^^^ File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1020, in _request raise self._make_status_error_from_response(err.response) from None openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'type': 'string_type', 'loc': ['body', 'input', 'str'], 'msg': 'Input should be a valid string', 'input': [[2323, 4961]]}, {'type': 'string_type', 'loc': ['body', 'input', 'list[str]', 0], 'msg': 'Input should be a val id string', 'input': [2323, 4961]}]}

Description

I am trying to use langchain_openai.embeddings - AzureOpenAIEmbeddings. But I get an error when trying to embed even a simple string. I was trying to use the embedding_model with Vector Search but was getting an error and after some few hours of debugging I found that the embedding_model was having issue.

I tried to then figure out if it is an issue in the code, so I put the embedding code in the most simplest format and then tried to run it but still got error.

System Info

langchain==0.0.352 langchain-community==0.0.20 langchain-core==0.1.52 langchain-openai==0.1.6

Govindarajan-D avatar May 12 '24 09:05 Govindarajan-D

Let me see.

liugddx avatar May 13 '24 02:05 liugddx

Did you deploy your embedding endpoint in Azure? If not then try that.

I don't think this is an issue with langchain.

asd878988 avatar May 16 '24 20:05 asd878988

@jackbullen. Yes it is deployed. I am converting an existing code for running using langchain.

In my existing model, I have used AzureOpenAI from openai library and used embeddings.create and it works without issue. Even in my issue description, I have mentioned that __tokenize works in langchain but not "embed_query"

Govindarajan-D avatar May 17 '24 05:05 Govindarajan-D

Yes it is deployed.

OK, I asked did you deploy it.

If you did and it's still giving the same error then idk, but a temporary workaround is passing the string query rather than tokens into OpenAIEmbeddings.client.create

asd878988 avatar May 18 '24 15:05 asd878988

Hi, any updates since? Thanks!

tindo2003 avatar May 23 '24 04:05 tindo2003

Yes it is deployed.

OK, I asked did you deploy it.

If you did and it's still giving the same error then idk, but a temporary workaround is passing the string query rather than tokens into OpenAIEmbeddings.client.create

Sorry about that, but I did not deploy the Open AI model in Azure, but I am using it with API Key. Embedding works when I use AzureOpenAI.embeddings.create but only fails when I use langchain

Govindarajan-D avatar May 23 '24 09:05 Govindarajan-D

I am getting same issue. Perhaps, the deployment is different from the model name? As indicated by https://github.com/langchain-ai/langchain/issues/1560

tindo2003 avatar May 23 '24 12:05 tindo2003

I am facing the same issue if it is resolved yet, I have a Proxy endpoint provided by the hackathon I am taking part in, for me langchain AzureOpenAIEmbeddings is giving me 422 error

amanjam avatar Jun 13 '24 19:06 amanjam

set check_embedding_ctx_length=False works for me

eg. from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model=model_name, openai_api_key="key", base_url=url, check_embedding_ctx_length=False)

TEH000 avatar Jul 03 '24 01:07 TEH000