word2vec-pytorch issues

Results 8 word2vec-pytorch issues

Sort by recently updated

SubSampling formula

Why add (t/f) in this formula for discards: ``` t = 0.0001 f = np.array(list(self.word_frequency.values())) / self.token_count self.discards = np.sqrt(t / f) + (t / f) ```

francesco-mollica

List boundary discards one token in the context window

https://github.com/Andras7/word2vec-pytorch/blob/36b93a503e8b3b5448abbc0e18f2a6bd3e017fc9/word2vec/data_reader.py#L102 I think `i + boundary` should include a `+ 1` to make it inclusive, otherwise the right context takes 1 token less in the resulting skipgrams.

jonnybluesman

fixed issue of passing empty parameter list to optimizer

Hi @Andras7, first I want to thank you for providing this code, it really is a big help. I ran into this error when trying to train the model: Traceback...

prschoenfelder

negative_samling

hi! pow_frequency = np.array(list(self.word_frequency.values())) ** 0.5 should not be pow_frequency = np.array(list(self.word_frequency.values())) ** 0.75

Mayar2009

Concerning definition for running_loss

It is not an issue. I just want to ask why do you use running_loss = running_loss*0.9 + loss.item()*0.1 for monitoring the loss during training? Do you have any special...

phucdoitoan

Preferred format of acknowledgement?

Hello Andras7, Thank you for this fast and effective implementation of word2vec. We have forked your repository and added some augmentations for a research project, and would like to properly...

NancyFulda

Turning this code into pip package

Hi @Andras7 I reorganized your code a little bit to make it easily installable with pip.. You can install it with: ` pip install git+https://github.com/marta-sd/word2vec-pytorch.git` You can take a look...

marta-sd

About the loss

Hello！Thanks for your code! Have you observed the loss? I had loaded the code and executed it. However, the loss didn't seems to be convergent. It descends rapidly at first,...

hsx479

word2vec-pytorch
word2vec-pytorch copied to clipboard

Metadata

SubSampling formula

List boundary discards one token in the context window

fixed issue of passing empty parameter list to optimizer

negative_samling

Concerning definition for running_loss

Preferred format of acknowledgement?

Turning this code into pip package

About the loss

← Metadata

Owner

Metadata

word2vec-pytorch word2vec-pytorch copied to clipboard

Metadata

← Metadata

Owner

Metadata

word2vec-pytorch
word2vec-pytorch copied to clipboard