burn icon indicating copy to clipboard operation
burn copied to clipboard

Transformer prenorm location

Open Philonoist opened this issue 1 year ago • 1 comments

Feature description

Currentlt, the transformer module has a norm_first flag, but I think it is not used as intended. Currently, is does this: x=norm(x); x += FF(x) According to https://arxiv.org/pdf/2002.04745.pdf, it should be done after diverging to the residual path: x += FF(norm(x)) This is the recommended way to do normalizaiton in transformers today AFAIU. I tried making a PR https://github.com/tracel-ai/burn/pull/1054 but for some reaosn I don't understand, the rest fails only on the torch backend...

Philonoist avatar Dec 07 '23 21:12 Philonoist

@Philonoist there is a bug in the tch backend, I'm going to work on it very soon, so you can ignore them for now.

nathanielsimard avatar Dec 12 '23 18:12 nathanielsimard

Fixed in #1054

louisfd avatar Jan 12 '24 13:01 louisfd