sense2vec icon indicating copy to clipboard operation
sense2vec copied to clipboard

how to do vector arithmetic?

Open sleepandpancakes opened this issue 2 years ago • 4 comments

how do i use the API to do manual vector arithmetic on vectorized words/phrases? for example, adding an arbitrary vector to vector corresponding to a word and returning the result? or linear interpolation between two vectorized words and converting to corresponding word?

sleepandpancakes avatar Oct 13 '23 02:10 sleepandpancakes

You can obtain the vectors like this (see example in the readme):

import spacy

nlp = spacy.load("en_core_web_sm")
s2v = nlp.add_pipe("sense2vec")
s2v.from_disk("/path/to/s2v_reddit_2015_md")

doc = nlp("A sentence about natural language processing.")
vector = doc[3:6]._.s2v_vec

You can then use e. g. numpy to do whatever vector arithmetic on the embeddings you obtained.

rmitsch avatar Oct 13 '23 11:10 rmitsch

thank you. is there a way to take an arbitrary vector and find the closest corresponding word in the vocab? i'm still having a bit of trouble understanding how i would do this

sleepandpancakes avatar Oct 14 '23 03:10 sleepandpancakes

What you're looking for is a nearest neighbor search. sense2vec doesn't expose this in the public API, but there are a lot of tools for this - sorted by complexity/overhead/capabilities from low to high:

rmitsch avatar Oct 16 '23 07:10 rmitsch

thank you again

sleepandpancakes avatar Oct 23 '23 23:10 sleepandpancakes