TransformerEngine
TransformerEngine copied to clipboard
Parallel residual Transformer layer
Hi,
Is there a way to enable parallel residual similar to HF GPT-neox use_parallel_residual config to speed up training?
@ksivaman If currently not supported do you have any plans to support?