langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Missing Azure OpenAI support for "OpenAIEmbeddings"

Open tunayokumus opened this issue 1 year ago • 7 comments

It's currently not possible to pass a custom deployment name as model/deployment names are hard-coded as "text-embedding-ada-002" in variables within the class definition.

In Azure OpenAI, the deployment names can be customized and that doesn't work with OpenAIEmbeddings class.

There is proper Azure support for LLM OpenAI, but it is missing for Embeddings.

tunayokumus avatar Mar 10 '23 09:03 tunayokumus

Same issue as #1560

lakshyaag avatar Mar 10 '23 09:03 lakshyaag

FYI I worked around this issue with naming my deployment name "text-embedding-ada-002" using the model of the same name. The next issue I ran into was hitting the rate limiter when using embeddings, as the default is 300(?) requests per minute. I had to add my own retrying functionality to the /embeddings/openai.py code, but I am using an older version (0.088?) and it looks like in more recent commits better retrying is being built in.

just in case it helps you, here is my embed_documents function, with modifications at the end (last else case):

import tenacity
from tenacity import retry
...

def embed_documents(
        self, texts: List[str], chunk_size: int = 1000
    ) -> List[List[float]]:
        """Call out to OpenAI's embedding endpoint for embedding search docs.

        Args:
            texts: The list of texts to embed.
            chunk_size: The maximum number of texts to send to OpenAI at once
                (max 1000).

        Returns:
            List of embeddings, one for each text.
        """
        # handle large batches of texts
        if self.embedding_ctx_length > 0:
            return self._get_len_safe_embeddings(
                texts, engine=self.document_model_name, chunk_size=chunk_size
            )
        else:
            @retry(wait=tenacity.wait_fixed(10), stop=tenacity.stop_after_attempt(60))
            def addText(text):
                embedding = self._embedding_func(text, engine=self.document_model_name)
                return embedding
            responses = []
            for text in texts:
                responses.append(addText(text))
                    
            return responses

JonAtDocuWare avatar Mar 10 '23 10:03 JonAtDocuWare

The issue comes when you try to use GPT-3.5-turbo Azure does not let you name the deployments with '.'

geg00 avatar Mar 22 '23 12:03 geg00

The issue comes when you try to use GPT-3.5-turbo Azure does not let you name the deployments with '.'

Can you create embeddings with GPT-3.5-turbo?

JonAtDocuWare avatar Mar 22 '23 12:03 JonAtDocuWare

I'm using embeddings = OpenAIEmbeddings(chunk_size=1) for embedings By default OpenAIEmbeddings uses text-embedding-ada-002 Create a model deployment name text-embedding-ada-002 with the model text-embedding-002

Aure will not work unless you have chunk_size=1

geg00 avatar Mar 22 '23 12:03 geg00

The issue comes when you try to use GPT-3.5-turbo Azure does not let you name the deployments with '.'

So this is not really an issue for OpenAIEmbeddings

JonAtDocuWare avatar Mar 22 '23 13:03 JonAtDocuWare

Opened a PR to fix this issue by separating deployment name and model name: https://github.com/hwchase17/langchain/pull/3076

tunayokumus avatar Apr 18 '23 08:04 tunayokumus

Hi, @tunayokumus! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue is about the missing Azure OpenAI support for "OpenAIEmbeddings". The model and deployment names are hard-coded and cannot be customized. One user suggested a workaround by naming the deployment "text-embedding-ada-002" using the model of the same name. Another user mentioned hitting the rate limiter when using embeddings and added their own retrying functionality.

However, it seems that the original author has addressed the issue by opening a pull request to fix it.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository!

dosubot[bot] avatar Sep 20 '23 16:09 dosubot[bot]