langchain
langchain copied to clipboard
Missing Azure OpenAI support for "OpenAIEmbeddings"
It's currently not possible to pass a custom deployment name as model/deployment names are hard-coded as "text-embedding-ada-002" in variables within the class definition.
In Azure OpenAI, the deployment names can be customized and that doesn't work with OpenAIEmbeddings class.
There is proper Azure support for LLM OpenAI, but it is missing for Embeddings.
Same issue as #1560
FYI I worked around this issue with naming my deployment name "text-embedding-ada-002" using the model of the same name. The next issue I ran into was hitting the rate limiter when using embeddings, as the default is 300(?) requests per minute. I had to add my own retrying functionality to the /embeddings/openai.py code, but I am using an older version (0.088?) and it looks like in more recent commits better retrying is being built in.
just in case it helps you, here is my embed_documents function, with modifications at the end (last else case):
import tenacity
from tenacity import retry
...
def embed_documents(
self, texts: List[str], chunk_size: int = 1000
) -> List[List[float]]:
"""Call out to OpenAI's embedding endpoint for embedding search docs.
Args:
texts: The list of texts to embed.
chunk_size: The maximum number of texts to send to OpenAI at once
(max 1000).
Returns:
List of embeddings, one for each text.
"""
# handle large batches of texts
if self.embedding_ctx_length > 0:
return self._get_len_safe_embeddings(
texts, engine=self.document_model_name, chunk_size=chunk_size
)
else:
@retry(wait=tenacity.wait_fixed(10), stop=tenacity.stop_after_attempt(60))
def addText(text):
embedding = self._embedding_func(text, engine=self.document_model_name)
return embedding
responses = []
for text in texts:
responses.append(addText(text))
return responses
The issue comes when you try to use GPT-3.5-turbo Azure does not let you name the deployments with '.'
The issue comes when you try to use GPT-3.5-turbo Azure does not let you name the deployments with '.'
Can you create embeddings with GPT-3.5-turbo?
I'm using
embeddings = OpenAIEmbeddings(chunk_size=1)
for embedings
By default OpenAIEmbeddings uses text-embedding-ada-002
Create a model deployment name
text-embedding-ada-002 with the model text-embedding-002
Aure will not work unless you have chunk_size=1
The issue comes when you try to use GPT-3.5-turbo Azure does not let you name the deployments with '.'
So this is not really an issue for OpenAIEmbeddings
Opened a PR to fix this issue by separating deployment name and model name: https://github.com/hwchase17/langchain/pull/3076
Hi, @tunayokumus! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue is about the missing Azure OpenAI support for "OpenAIEmbeddings". The model and deployment names are hard-coded and cannot be customized. One user suggested a workaround by naming the deployment "text-embedding-ada-002" using the model of the same name. Another user mentioned hitting the rate limiter when using embeddings and added their own retrying functionality.
However, it seems that the original author has addressed the issue by opening a pull request to fix it.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your contribution to the LangChain repository!