Junseong Kim
Junseong Kim
@mapingshuo Sorry It's my fault. haha I just made that title in 5seconds :) thank you!! 👍
@coddinglxf I just solved that problem with `nn.NLLLoss(ignore_index=0)` which 0 is equal to pad_index. Even if we target the 0(unmasked_value), it doesn't affect to the loss of propagation
@coddinglxf that's what I thought at first, but can't implement it efficiently as much as GPU computation time. If you have any idea please implement and pull request plez :)...
@leon-cas yes #36 it's solved with your question
I'll update the vocab builder ASAP! thanx
hmmm? what do you mean the output embedding? you mean the softmaxed output distribution?
Is there any benefit if we bind two layer weight? If it is, please can you let me know some references which has similar architecture?
@jiqiujia @briandw Cool I'll implement is on 0.0.1a5 version, but it seems like solving #32 is more high priority
@iOSGeekerOfChina I didn't decide yet, just started this project one hour ago haha. Do you think using the dataset which referred on paper is good idea? Or have some another...