ParlAI icon indicating copy to clipboard operation
ParlAI copied to clipboard

TransformerEncoderLayer change lead to different behavior on legacy code

Open dexterju27 opened this issue 5 years ago • 3 comments

A heads up for whom might be using Transformer encoder. The transformer Encoder Layer forward loop use to be:

tensor = tensor + self.dropout(self.attention(tensor, mask=mask))
tensor = _normalize(tensor, self.norm1)
tensor = tensor + self.dropout(self.ffn(tensor))
 tensor = _normalize(tensor, self.norm2)
 tensor *= mask.unsqueeze(-1).type_as(tensor)
 return tensor

While the current in master: https://github.com/facebookresearch/ParlAI/blob/a3c4c3ef68c616e1bb6cc0796313356117a6bcd4/parlai/agents/transformer/modules.py#L588 As the default variant is None, the default forward behavior is changed from the past when no variant is given https://github.com/facebookresearch/ParlAI/blob/a3c4c3ef68c616e1bb6cc0796313356117a6bcd4/parlai/agents/transformer/modules.py#L602

        if self.variant == 'prelayernorm':
            tensor = _normalize(tensor, self.norm1)
        attended_tensor, _ = self.attention(tensor, mask=mask)
        tensor = residual + self.dropout(attended_tensor)
        if self.variant == 'aiayn' or self.variant == 'xlm':
            tensor = _normalize(tensor, self.norm1)
        residual = tensor
        if self.variant == 'prelayernorm':
            tensor = _normalize(tensor, self.norm2)
        tensor = residual + self.dropout(self.ffn(tensor))
        if self.variant == 'aiayn' or self.variant == 'xlm':
            tensor = _normalize(tensor, self.norm2)
        tensor *= mask.unsqueeze(-1).type_as(tensor)
        return tensor

It requires extra investigation when we load old model back. Especially for those directly using TransformerEncoderLayers.

Suggestion: Change that default to 'xlm'.

dexterju27 avatar Apr 24 '20 20:04 dexterju27

Dexter, please embed multilines when you quote code, the issues are difficult to read without this.

stephenroller avatar Apr 25 '20 22:04 stephenroller

I don't think a default variant should be xlm, it should be aiayn if anything, but really we should just make None behave the same as aiayn.

stephenroller avatar Apr 25 '20 22:04 stephenroller

This issue has not had activity in 30 days. Marking as stale.

github-actions[bot] avatar Jun 02 '20 00:06 github-actions[bot]

closing as i'm not entirely sure what the issue is here; please reopen if this is breaking

klshuster avatar Nov 09 '22 22:11 klshuster