Jamie J. Seol

Results 9 comments of Jamie J. Seol

Uh, I wanted to set the line on `padding_idx` to be 0, but in the code it initializes the first line even if `padding_idx` is not zero. Would you PR...

`padding_idx` denotes, literally, a padding index for a padding word. The padding word should have zero vector (usually), so it should be initialized as so. About your code, I'm not...

@GyeongJoHwang Normally, the "unknown word", or UNK token denotes rare word rather than padding, so making it 0 might be a solution but usually we use random vector or mean...

That would give more precise loss value, however, I'm afraid that applying this might consume too much time for training. I quickly wrote a sample code of your idea. With...

If you have some good idea for implementation, please go ahead for PR. Otherwise, I'll close this issue. Thank you.

@msummerfield Thanks for the detailed feedback! Awesome. Idea of using the 'faster' loss looks meaningful. The main reason I retained all the details is that the overall loss remains mathematically...

Thank you for the feedback. Can you provide a reduced, reproducible case sample? Like, small dataset and a configuration for it.

Got it. Thanks. But actually, I still feel sorry for quite poor quality of my PR, while still not sure how to improve it.