attention-is-all-you-need-keras maybe i find a point should be change

maybe i find a point should be change

Open alphanlp opened this issue 6 years ago • 2 comments

self.target_layer = TimeDistributed(Dense(o_tokens.num(), use_bias=False)) change to: self.target_layer = TimeDistributed(Dense(o_tokens.num(), activation='softmax', use_bias=False))

Dec 04 '18 07:12 alphanlp

it's very interesting, when i user softmax as proposed in paper, the loss can not down

Dec 05 '18 01:12 alphanlp

The tf loss contains a softmax. In fact, you do softmax twice.

Dec 05 '18 03:12 lsdefine

attention-is-all-you-need-keras attention-is-all-you-need-keras copied to clipboard

maybe i find a point should be change

attention-is-all-you-need-keras
attention-is-all-you-need-keras copied to clipboard