magpie Why not remove more stop-words in text processing???

Why not remove more stop-words in text processing???

Open JiaWenqi opened this issue 5 years ago • 1 comments

def get_all_words(self): """ Return all words tokenized, in lowercase and without punctuation """ return [w.lower() for w in word_tokenize(self.text) if w not in string.punctuation] I found that in this function, only punctuation of the text was removed. But there are other types of words that have not been removed. eg: from nltk.corpus import stopwords words = stopwords.words('english')

Mar 13 '19 07:03 JiaWenqi

yeah, we want to leave the stopwords in for word2vec to work better.

Mar 13 '19 09:03 jstypka

magpie magpie copied to clipboard

Why not remove more stop-words in text processing???

magpie
magpie copied to clipboard