attention-is-all-you-need-keras icon indicating copy to clipboard operation
attention-is-all-you-need-keras copied to clipboard

maybe i find a point should be change

Open alphanlp opened this issue 6 years ago • 2 comments

self.target_layer = TimeDistributed(Dense(o_tokens.num(), use_bias=False)) change to: self.target_layer = TimeDistributed(Dense(o_tokens.num(), activation='softmax', use_bias=False))

alphanlp avatar Dec 04 '18 07:12 alphanlp

it's very interesting, when i user softmax as proposed in paper, the loss can not down

alphanlp avatar Dec 05 '18 01:12 alphanlp

The tf loss contains a softmax. In fact, you do softmax twice.

lsdefine avatar Dec 05 '18 03:12 lsdefine