treelstm.pytorch icon indicating copy to clipboard operation
treelstm.pytorch copied to clipboard

Why zero out embeddings for special words if they are absent in vocab

Open Silenthinker opened this issue 6 years ago • 2 comments

Hi,

I noticed that in main.py, you zero out the embeddings for special words if they are absent in vocabulary:

# zero out the embeddings for padding and other special words if they are absent in vocab
for idx, item in enumerate([Constants.PAD_WORD, Constants.UNK_WORD, Constants.BOS_WORD, Constants.EOS_WORD]):
    emb[idx].zero_()

Is there any reason for doing so? Why not using random normal vectors?

Thanks.

Silenthinker avatar Dec 20 '17 12:12 Silenthinker

Hi @Silenthinker

As far as I remember, when initialising the embeddings, I realised that the PAD_WORD needs to be zeroed out. At the time, I was unsure what to do with the other special words, and left them as zero-ed out to start with. I believe you can try initializing them normally, it should be fine.

Do let me know if you get a chance to try out random normal initialization!

dasguptar avatar Dec 21 '17 04:12 dasguptar

Thanks for your reply. I'll try it out.

However, it seems unclear to me what the role of PAD_WORD is since I didn't find anywhere it is used for padding sentences. Did I miss it somewhere?

Thanks.

Silenthinker avatar Dec 22 '17 10:12 Silenthinker