TransformerEngine
TransformerEngine copied to clipboard
MLP without LayerNorm
Is there currently a way to use MLP without applying the LayerNorm? What would be the best way to implement this? Thanks!
The simplest solution would be to manually construct an MLP out of multiple te.Linear
s, but this won't be able to do all of the kernel fusions in te.LayerNormMLP
.
Long-term, this kind of customization is the purpose of the operation-based API being developed in https://github.com/NVIDIA/TransformerEngine/pull/707:
mlp = te.Sequential(
te.ops.Linear(...),
te.ops.GeLU(),
te.ops.Linear(...),
)