langchain
langchain copied to clipboard
Using AzureOpenAIEmbeddings throws input string is not valid when trying to embed a string
Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
I have imported the langchain library for embeddings
from langchain_openai.embeddings import AzureOpenAIEmbeddings
And then built the embedding model like below:
embedding_model = AzureOpenAIEmbeddings(
azure_endpoint= AOAI_ENDPOINT,
openai_api_key = AOAI_KEY
)
When I try to run a simple _token, it succeeds
print(embedding_model._tokenize(["Test","Message"],2048))
But if I try to embed a query, it throws an error saying 'Input should be a valid string'
print(embedding_model.embed_query("Test Message"))
Error Message and Stack Trace (if applicable)
Traceback (most recent call last):
File "c:\Users\govindarajand\backend-llm-model\stock_model\embed-test.py", line 55, in
return self._post(
^^^^^^^^^^^
File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1240, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 921, in request return self._request( ^^^^^^^^^^^^^^ File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1020, in _request raise self._make_status_error_from_response(err.response) from None openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'type': 'string_type', 'loc': ['body', 'input', 'str'], 'msg': 'Input should be a valid string', 'input': [[2323, 4961]]}, {'type': 'string_type', 'loc': ['body', 'input', 'list[str]', 0], 'msg': 'Input should be a val id string', 'input': [2323, 4961]}]}
Description
I am trying to use langchain_openai.embeddings - AzureOpenAIEmbeddings. But I get an error when trying to embed even a simple string. I was trying to use the embedding_model with Vector Search but was getting an error and after some few hours of debugging I found that the embedding_model was having issue.
I tried to then figure out if it is an issue in the code, so I put the embedding code in the most simplest format and then tried to run it but still got error.
System Info
langchain==0.0.352 langchain-community==0.0.20 langchain-core==0.1.52 langchain-openai==0.1.6
Let me see.
Did you deploy your embedding endpoint in Azure? If not then try that.
I don't think this is an issue with langchain.
@jackbullen. Yes it is deployed. I am converting an existing code for running using langchain.
In my existing model, I have used AzureOpenAI from openai library and used embeddings.create and it works without issue. Even in my issue description, I have mentioned that __tokenize works in langchain but not "embed_query"
Yes it is deployed.
OK, I asked did you deploy it.
If you did and it's still giving the same error then idk, but a temporary workaround is passing the string query rather than tokens into OpenAIEmbeddings.client.create
Hi, any updates since? Thanks!
Yes it is deployed.
OK, I asked did you deploy it.
If you did and it's still giving the same error then idk, but a temporary workaround is passing the string query rather than tokens into
OpenAIEmbeddings.client.create
Sorry about that, but I did not deploy the Open AI model in Azure, but I am using it with API Key. Embedding works when I use AzureOpenAI.embeddings.create but only fails when I use langchain
I am getting same issue. Perhaps, the deployment is different from the model name? As indicated by https://github.com/langchain-ai/langchain/issues/1560
I am facing the same issue if it is resolved yet, I have a Proxy endpoint provided by the hackathon I am taking part in, for me langchain AzureOpenAIEmbeddings is giving me 422 error
set check_embedding_ctx_length=False works for me
eg. from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model=model_name, openai_api_key="key", base_url=url, check_embedding_ctx_length=False)