instructor-embedding icon indicating copy to clipboard operation
instructor-embedding copied to clipboard

Phrase embeddings in context

Open jnferfer opened this issue 1 year ago • 2 comments

Hi,

I need to get the embeddings of a word or a phrase within a sentence. This sentence is the context of the word/phrase.

For example, I need the different embedding values of big apple in these two sentences:

I'm living in the Big Apple since 2012 I ate a big apple yesterday

When using model.encode() I can set the parameter output_value to token_embeddings to get token embeddings. However, I don't know how to properly map the output vectors to the target tokens corresponding to the big apple text. Is there a straightforward approach for this?

Thanks!

jnferfer avatar Feb 12 '24 15:02 jnferfer

You may first check the tokenization of the sentences, record the indices of desired words, e.g., big apple, and find token embeddings following the indices.

hongjin-su avatar Apr 12 '24 17:04 hongjin-su

Thanks! Then, if I want to get a single embedding for "big apple", how should I proceed? I'm trying to get the average embedding of "big" and "apple", but I sometimes get odd results when comparing the average embedding against others.

jnferfer avatar Apr 21 '24 10:04 jnferfer