instructor-embedding
instructor-embedding copied to clipboard
Phrase embeddings in context
Hi,
I need to get the embeddings of a word or a phrase within a sentence. This sentence is the context of the word/phrase.
For example, I need the different embedding values of big apple
in these two sentences:
I'm living in the Big Apple since 2012
I ate a big apple yesterday
When using model.encode()
I can set the parameter output_value
to token_embeddings
to get token embeddings. However, I don't know how to properly map the output vectors to the target tokens corresponding to the big apple
text. Is there a straightforward approach for this?
Thanks!
You may first check the tokenization of the sentences, record the indices of desired words, e.g., big apple, and find token embeddings following the indices.
Thanks! Then, if I want to get a single embedding for "big apple", how should I proceed? I'm trying to get the average embedding of "big" and "apple", but I sometimes get odd results when comparing the average embedding against others.