DeepResearch Custom softmax AttentionWithContext class instead ofTensorFlow implementation

Custom softmax AttentionWithContext class instead ofTensorFlow implementation

Open adamwawrzynski opened this issue 5 years ago • 0 comments

According to Hierarchical Attention Networks for Document Classification in the attention mechanism they use softmax function to obtain weights of attention layer. I think there is a bug in Your AttentionWithContext class, in call function:

def call(self, x, mask=None):
        uit = dot_product(x, self.W)

        if self.bias:
            uit += self.b

        uit = K.tanh(uit)
        ait = dot_product(uit, self.u)

        a = K.exp(ait)

        # apply mask after the exp. will be re-normalized next
        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            a *= K.cast(mask, K.floatx())

        # in some cases especially in the early stages of training the sum may be almost zero
        # and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
        # a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
        a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())

        a = K.expand_dims(a)
        weighted_input = x * a
        return K.sum(weighted_input, axis=1)

I think You should use

a = K.softmax(ait)

instead of

a = K.exp(ait)

# apply mask after the exp. will be re-normalized next
if mask is not None:
    # Cast the mask to floatX to avoid float64 upcasting in theano
    a *= K.cast(mask, K.floatx())

# in some cases especially in the early stages of training the sum may be almost zero
# and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
# a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())

I think the code performs the same operation, but K.softmax(ait) is provided by Keras(TensorFlow), so it should be more safe to use.

Jan 30 '20 11:01 adamwawrzynski

DeepResearch DeepResearch copied to clipboard

Custom softmax AttentionWithContext class instead ofTensorFlow implementation

DeepResearch
DeepResearch copied to clipboard