word2vec-pytorch icon indicating copy to clipboard operation
word2vec-pytorch copied to clipboard

negative_samling

Open Mayar2009 opened this issue 4 years ago • 1 comments

hi! pow_frequency = np.array(list(self.word_frequency.values())) ** 0.5

should not be pow_frequency = np.array(list(self.word_frequency.values())) ** 0.75

Mayar2009 avatar Feb 23 '20 11:02 Mayar2009

Hi! It does not matter too much. My implementation comes from fasttext: void Model::initTableNegatives(const std::vector<int64_t>& counts) { real z = 0.0; for (size_t i = 0; i < counts.size(); i++) { z += pow(counts[i], 0.5); } for (size_t i = 0; i < counts.size(); i++) { real c = pow(counts[i], 0.5); for (size_t j = 0; j < c * NEGATIVE_TABLE_SIZE / z; j++) { negatives.push_back(i); } } std::shuffle(negatives.begin(), negatives.end(), rng); }

Andras7 avatar Aug 24 '20 17:08 Andras7