word2vec-pytorch icon indicating copy to clipboard operation
word2vec-pytorch copied to clipboard

SubSampling formula

Open francesco-mollica opened this issue 2 years ago • 0 comments

Why add (t/f) in this formula for discards:

t = 0.0001
f = np.array(list(self.word_frequency.values())) / self.token_count
self.discards = np.sqrt(t / f) + (t / f)

francesco-mollica avatar Nov 24 '21 16:11 francesco-mollica