langchain Feature Request: Allow initializing HuggingFaceEmbeddings from the cached weight

Motivation

Right now, HuggingFaceEmbeddings doesn't support loading an embedding model's weights from the cache but downloading the weights every time. Fixing this would be a low hanging fruit by allowing the user to pass their cache directory.

Suggestion

The only change has only a few lines in init()


class HuggingFaceEmbeddings(BaseModel, Embeddings):
    """Wrapper around sentence_transformers embedding models.

    To use, you should have the ``sentence_transformers`` python package installed.

    Example:
        .. code-block:: python

            from langchain.embeddings import HuggingFaceEmbeddings
            model_name = "sentence-transformers/all-mpnet-base-v2"
            hf = HuggingFaceEmbeddings(model_name=model_name)
    """

    client: Any  #: :meta private:
    model_name: str = DEFAULT_MODEL_NAME
    """Model name to use."""

    def __init__(self, cache_folder=None, **kwargs: Any):
        """Initialize the sentence_transformer."""
        super().__init__(**kwargs)
        try:
            import sentence_transformers
            self.client = sentence_transformers.SentenceTransformer(model_name_or_path=self.model_name, cache_folder=cache_folder)
        except ImportError:
            raise ValueError(
                "Could not import sentence_transformers python package. "
                "Please install it with `pip install sentence_transformers`."
            )

    class Config:
        """Configuration for this pydantic object."""

        extra = Extra.forbid

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Compute doc embeddings using a HuggingFace transformer model.

        Args:
            texts: The list of texts to embed.

        Returns:
            List of embeddings, one for each text.
        """
        texts = list(map(lambda x: x.replace("\n", " "), texts))
        embeddings = self.client.encode(texts)
        return embeddings.tolist()

    def embed_query(self, text: str) -> List[float]:
        """Compute query embeddings using a HuggingFace transformer model.

        Args:
            text: The text to embed.

        Returns:
            Embeddings for the text.
        """
        text = text.replace("\n", " ")
        embedding = self.client.encode(text)
        return embedding.tolist()

Apr 18 '23 09:04 nicolefinnie

I am eager to learn how the solution to this problem is approached. Can you tell me where the weights are located and how they are downloaded? I am a beginner and am excited to see the solution, but I will only contribute if I have a better understanding of the process since I have limited experience in machine learning engineering.

Apr 18 '23 10:04 kanukolluGVT

I am eager to learn how the solution to this problem is approached. Can you tell me where the weights are located and how they are downloaded? I am a beginner and am excited to see the solution, but I will only contribute if I have a better understanding of the process since I have limited experience in machine learning engineering.

Thanks for your quick response. The weight would be downloaded if the user doesn't specify the cache_folder, and initializes SentenceTransformer() (from the python package sequence_transformer, directly

By default, if no cache_folder is given, SequenceTransformer will search the weights from the directory SENTENCE_TRANSFORMERS_HOME, see here, if the weights are not found, it will download the weights from huggingface hub.

so the alternative for users without changing the LangChain code here is to create a env SENTENCE_TRANSFORMERS_HOME that points to the real weight location, not ideal, but acceptable. In this case, we could document the usage on the LangChain HuggingFaceEmbedding docstring, but it will transfer the complexity to the user with adding the env variable to their python script. To make it user-friendly, we could offer this cache_folder option.

Apr 18 '23 10:04 nicolefinnie

@nicolefinnie Yup this make sense. Thanks for the suggestion!

Apr 18 '23 10:04 azamiftikhar1000

Can we decode the embeddings?

Apr 28 '23 05:04 Chetan-Yeola

Isn't the dependency on sentence_transformers limiting? I.e. if I wanted to test openassistant llm initialized locally from weights I couldn't use the class HuggingFaceEmbeddings because sentence_transformers doesn't support openassistant. Am I missing something are all hf llms (i.e. open assistant, llama, vicuna etc) compatible with sentence_transformers embeddings (both library and actual model embeddings).

Apr 29 '23 22:04 zlapp

Does someone have a working example of initializing HuggingFaceEmbeddings without an internet connection? I have tried specifying the "cache_folder" parameter with the file path of pre-downloaded embeddings code from huggingface, but it seems to be ignored

Aug 05 '23 00:08 mirodrr

Hi, just asking again: Does anyone have a working example of initializing HuggingFaceEmbeddings without an internet connection?

I need to use this class with pre-downloaded embeddings code instead of downloading from huggingface everytime.

Sep 19 '23 07:09 mirodrr

Hi, just asking again: Does anyone have a working example of initializing HuggingFaceEmbeddings without an internet connection?

I need to use this class with pre-downloaded embeddings code instead of downloading from huggingface everytime.

I have make it works by this method.

from langchain.embeddings import HuggingFaceEmbeddings
embedding_models_root = "/mnt/embedding_models"
model_ckpt_path = os.path.join(embedding_models_root, 'multi-qa-MiniLM-L6-cos-v1')
embeddings = HuggingFaceEmbeddings(model_name=model_ckpt_path)
text = "This is a test document."
query_result = embeddings.embed_query(text)
doc_result = embeddings.embed_documents([text, "This is not a test document."])

print("==="*20)
print("query_result: \n {}".format(query_result))
print("==="*20)
print("doc_result: \n {}".format(doc_result))
print("==="*20)

Oct 24 '23 08:10 bolongliu

Hi, @nicolefinnie! I'm helping the LangChain team manage their backlog and am marking this issue as stale.

It looks like the issue you raised requests adding support for initializing HuggingFaceEmbeddings from cached weights instead of downloading them every time. There have been discussions about potential limitations, working examples, and clarifications on the weight location and download process. One user has even shared a working example of initializing HuggingFaceEmbeddings with pre-downloaded embeddings.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

Feb 06 '24 16:02 dosubot[bot]

langchain langchain copied to clipboard

Feature Request: Allow initializing HuggingFaceEmbeddings from the cached weight

Motivation

Suggestion

langchain
langchain copied to clipboard