TransformerEngine
TransformerEngine copied to clipboard
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...
**Is your feature request related to a problem? Please describe.** At the moment, we enumerate the parameters in C APIs like this: https://github.com/NVIDIA/TransformerEngine/blob/5e4e0b2c378d2b1ec2ee65dfa85124e1dd805389/transformer_engine/common/fused_attn/fused_attn.cpp#L835 As we add more features to attention,...
**Is your feature request related to a problem? Please describe.** This is not related to a problem, it is a feature request to expand model coverage **Describe the solution you'd...
**Describe the bug** Hi, Are there any plans to publish prebuilt wheels? Right now during pip install, the pybind modules are being built via CMake in a brittle manner (accessing...
Is your feature request related to a problem? Please describe. To be added Describe the solution you'd like Work on improving performance for FP8 current scaling Describe alternatives you've considered...
**Is your feature request related to a problem? Please describe.** The logic around cuDNN's support matrix for SDPA is getting long and hard to maintain. **Describe the solution you'd like**...
**Describe the bug** A clear and concise description of what the bug is. **Steps/Code to reproduce bug** Please list *minimal* steps or code snippet for us to be able to...
Hi I locally compiled branch release 2.8. when I tried to use nvfp4 on rtx 50 series it gave me error ``` /home/aza/workspace/projects/nvfp4/TransformerEngine/transformer_engine/common/util/nvfp4_transpose.cuh:234 in function mul_cvt_bf16_to_fp4_4x_with_rn (thread (95,0,0), block (2,2,0)):...
# Description This PR adds a persistent gated MXFP8 kernel optimized for rowwise scaling, SwiGLU activation (FWD and BWD) and BF16/FP16 input tensors. The kernel uses the "Cluster Launch Control"...
Does TransformerEngine have a distributed AdamW optimizer that we can use with DDP?