hufuzhipeng

Results 1 issues of hufuzhipeng

Acording to the paper of transformer , it seems that we can change x = x + self.sa(self.ln1(x)) x = x + self.ffwd(self.ln2(x)) to x = self.ln1(x + self.sa(x)) x...