nlp-tutorials icon indicating copy to clipboard operation
nlp-tutorials copied to clipboard

'Word2VecKeyedVectors' object has no attribute 'get_mean_vector'

Open shiv425 opened this issue 3 years ago • 4 comments

while converting tokens to vector for complete sentence in preprocess_and_vectorize method ,got error "'Word2VecKeyedVectors' object has no attribute 'get_mean_vector'"

shiv425 avatar Nov 03 '22 11:11 shiv425

i tried to convert each token in vector and then to take mean using np.mean..but while converting df['Text'] to vector form getting errors like "Key 'u.s.-based' not present","Key ' ' not present","Key '2018' not present" etc..please help.

shiv425 avatar Nov 03 '22 11:11 shiv425

I think he used old version of gensim library, from 3.8 to 4.0 a lot of attributes changed. I also facing same issues tried couple of thing but it didnt help at all. Poorly documentated library to be honest im seaching hours and couldnt find anything useful.

elandil2 avatar Nov 03 '22 23:11 elandil2

`def preprocess_and_vectorize(text): # remove stop words and lemmatize the text doc = nlp(text) filtered_tokens = [] arr = [] for token in doc: if token.is_stop or token.is_punct: continue filtered_tokens.append(token.lemma_) for token in filtered_tokens: try: arr.append(wv[token]) except: continue

return np.mean(arr,axis=0)`

used this code.used try catch because many words have no vector in WV.

shiv425 avatar Nov 04 '22 05:11 shiv425

Solution to the problem

This is the alternative I have found for this problem and it's working

import spacy import numpy as np nlp=spacy.load("en_core_web_lg") def preprocess_and_vectorize(text): doc = nlp(text)
filtered_tokens = [] for token in doc: if token.is_punct or token.is_stop: continue filtered_tokens.append(token.lemma_) return np.mean(wv[filtered_tokens])

meet5398 avatar May 05 '23 09:05 meet5398