TransformerEngine MLP without LayerNorm

MLP without LayerNorm

Open sriniiyer opened this issue 9 months ago • 1 comments

Is there currently a way to use MLP without applying the LayerNorm? What would be the best way to implement this? Thanks!

Apr 26 '24 20:04 sriniiyer

The simplest solution would be to manually construct an MLP out of multiple te.Linears, but this won't be able to do all of the kernel fusions in te.LayerNormMLP.

Long-term, this kind of customization is the purpose of the operation-based API being developed in https://github.com/NVIDIA/TransformerEngine/pull/707:

mlp = te.Sequential(
    te.ops.Linear(...),
    te.ops.GeLU(),
    te.ops.Linear(...),
)

Jun 11 '24 19:06 timmoon10

TransformerEngine TransformerEngine copied to clipboard

MLP without LayerNorm

TransformerEngine
TransformerEngine copied to clipboard