unilm icon indicating copy to clipboard operation
unilm copied to clipboard

Can I use L2 to calc the distance between the 2 embeddings created from e5-base-v2?

Open weiZhenkun opened this issue 11 months ago • 3 comments

Describe I am using model e5-base-v2, I have seen the doc in the https://huggingface.co/intfloat/e5-base-v2, the doc says the cosine similarity scores distribute around 0.7 to 1.0.

how I use the e5-base-v2 model?

    1. Get 2 embeddings from e5-base-v2
    1. Use torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1) to normalize the embeddings
    1. Compare 2 embeddings using L2

My questions:

    1. Is it a right way? Can I use L2 to calc the distance between the 2 embeddings created from e5-base-v2?
    1. If we use the cosine similarity, need I normalize the embeddings?
    1. If the threshold of the entire e5-base-v2 is [0.7,1], is there a suitable range for the relatively similar areas?

@intfloat Can you help me?

weiZhenkun avatar Mar 14 '24 04:03 weiZhenkun