TransformerEngine
TransformerEngine copied to clipboard
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...
# Description This PR mainly adds the partial cast feature for mxfp8 primary weights. In FSDP, since each forward and backward pass requires gathering params, it's better to only gather...
I have tried nvfp4 training which converges on sm120, but fp8blockscaled recipe won't converge for any of its available options. Is it because of power of 2 scale (cannot be...
I can implement NVFP4-supported linear layer calls with a simple script, but when I use Megatron-LM for NVFP4 training, I found that the TE lacks support for NVFP4Tensors in the...
# Description Fixes crashs for binary linked with both libtorch and libtransformer_engine running with `nsys profile` . It was caused by wrong libcudnn.so loaded when system package like `libcudnn9-cuda-12` is...
# Description FSDP2 Allgather Perf improvement and support for FusedAdam with FSDP2 Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either...
# Description The fused cross entropy kernel in Transformer Engine uses 16-bit floating point (BF16) for the backward pass when the input is in BF16, whereas Megatron's VocabParallelCrossEntropy performs its...
1.Fused `moe_permute_with_probs` + `Fp8Padding` and fused `moe_unpermute` + `Fp8Unpadding`, which removes the explicit padding/unpadding in the MOE experts module, improved performance and reduced peak gpu memory usage. 2.Added tests of...
# Description Based on https://github.com/NVIDIA/TransformerEngine/pull/1948 Fixes the cuda graph order of backward_dw graphs when enabling `delay_wgrad_compute`, the user may delay the wgrad compute to the end of overlapped forward layers,...
# Description I want to be able to control num splits in FA3. This exposes this argument for non-context-parallel cases. ## Type of change - [ ] Documentation change (change...
# Description Please include a brief summary of the changes, relevant motivation and context. Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the...