hiersumm MultiHeadpooling is same with the paper?

MultiHeadpooling is same with the paper?

Open AbnerCode opened this issue 5 years ago • 1 comments

hi @nlpyang

I have some questions.

In your paper, equation （15）is not used, so why you propose that?

equation（13）and equation(14)

I don't know why you do this? Can you give some explanation? In addition, the az and bz do not appear in your code.

        scores = self.linear_keys(key)
        value = self.linear_values(value)

        scores = shape(scores, 1).squeeze(-1)
        value = shape(value)
        # key_len = key.size(2)
        # query_len = query.size(2)
        #
        # scores = torch.matmul(query, key.transpose(2, 3))

        if mask is not None:
            mask = mask.unsqueeze(1).expand_as(scores)
            scores = scores.masked_fill(mask, -1e18)

You also not use the way from paper to compute your scores. Why? Best Wishes!

Sep 17 '19 13:09 AbnerCode

equation 16 has a typo, where it should be used with \hat(a) not a.
you can think a and b as scores and values, which do appear in the code as https://github.com/nlpyang/hiersumm/blob/476e6bf9c716326d6e4c27d5b6878d0816893659/src/abstractive/attn.py#L241 https://github.com/nlpyang/hiersumm/blob/476e6bf9c716326d6e4c27d5b6878d0816893659/src/abstractive/attn.py#L242
I can't see why this is different from the paper.

Sep 17 '19 13:09 nlpyang

hiersumm hiersumm copied to clipboard

MultiHeadpooling is same with the paper?

hiersumm
hiersumm copied to clipboard