langchain Precision of HuggingFaceEmbeddings.embed

Precision of HuggingFaceEmbeddings.embed_query changes

Open alfred-liu96 opened this issue 1 year ago • 1 comments

System Info

Langchain version: 0.0.173 numpy version: 1.24.3

Related Components

[X] Embedding Models

Reproduction

from sentence_transformers import SentenceTransformer
import numpy as np
from langchain.embeddings import HuggingFaceEmbeddings

t = 'langchain embedding'

m = HuggingFaceEmbeddings(encode_kwargs={"normalize_embeddings": True})
# SentenceTransformer embeddings with unit norm
x = SentenceTransformer(m.model_name).encode(t, normalize_embeddings=True)
# Langchain.Huggingface embeddings with unit norm
y = m.embed_query(t)

print(f'L2 norm of SentenceTransformer: {np.linalg.norm(x)}. \nL2 norm of Langchain.Huggingface: {np.linalg.norm(y)}')

Expected behavior

Both of these two L2 norm results shoud be 1. But I got as blow:

L2 norm of SentenceTransformer: 1.0. 
L2 norm of Langchain.Huggingface: 1.0000000445724682

I think the problem came from this code. When converting array to list, the numbers got bigger

May 20 '23 08:05 alfred-liu96

In my case, when I used this embedding in FAISS vector store, the relevance_score I got cannot be limited between 0 and 1

May 20 '23 08:05 alfred-liu96

Hi, @alfred-liu96! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you reported is about the precision of the L2 norm calculation in the HuggingFaceEmbeddings.embed_query function. It seems that when converting an array to a list, the numbers become slightly larger. You mentioned in a comment that when using this embedding in FAISS vector store, the relevance_score obtained cannot be limited between 0 and 1.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain project!

Sep 01 '23 16:09 dosubot[bot]

langchain langchain copied to clipboard

Precision of HuggingFaceEmbeddings.embed_query changes

System Info

Related Components

Reproduction

Expected behavior

langchain
langchain copied to clipboard