TransformerEngine issues

**Is your feature request related to a problem? Please describe.** At the moment, we enumerate the parameters in C APIs like this: https://github.com/NVIDIA/TransformerEngine/blob/5e4e0b2c378d2b1ec2ee65dfa85124e1dd805389/transformer_engine/common/fused_attn/fused_attn.cpp#L835 As we add more features to attention,...

nvMelissa

refactor

attention

MuonClip for Kimi-K2 model

**Is your feature request related to a problem? Please describe.** This is not related to a problem, it is a feature request to expand model coverage **Describe the solution you'd...

nvMelissa

attention

Pre-built wheels

5

**Describe the bug** Hi, Are there any plans to publish prebuilt wheels? Right now during pip install, the pybind modules are being built via CMake in a brittle manner (accessing...

bhatuzdaname

build

FP8 attention with current scaling

Is your feature request related to a problem? Please describe. To be added Describe the solution you'd like Work on improving performance for FP8 current scaling Describe alternatives you've considered...

nvMelissa

performance

attention

Replace TE check_support with FE check_support

**Is your feature request related to a problem? Please describe.** The logic around cuDNN's support matrix for SDPA is getting long and hard to maintain. **Describe the solution you'd like**...

nvMelissa

refactor

[Colab Py3.12 / torch 2.8.0+cu126 / CUDA 12.6] transformer_engine.pytorch .so missing in 2.5.0 wheel (blocks Evo2 1B Base)

5

**Describe the bug** A clear and concise description of what the bug is. **Steps/Code to reproduce bug** Please list *minimal* steps or code snippet for us to be able to...

nakane-scc

bug

waiting-for-feedback

is NVFP4 not supported for rtx 50 series?

13

Hi I locally compiled branch release 2.8. when I tried to use nvfp4 on rtx 50 series it gave me error ``` /home/aza/workspace/projects/nvfp4/TransformerEngine/transformer_engine/common/util/nvfp4_transpose.cuh:234 in function mul_cvt_bf16_to_fp4_4x_with_rn (thread (95,0,0), block (2,2,0)):...

yash3056

[Common] Added an optimized gated rowwise MXFP8 SwiGLU kernel

4

# Description This PR adds a persistent gated MXFP8 kernel optimized for rowwise scaling, SwiGLU activation (FWD and BWD) and BF16/FP16 input tensors. The kernel uses the "Cluster Launch Control"...

Oleg-Goncharov

Distributed AdamW Optimizer for DDP training?

Does TransformerEngine have a distributed AdamW optimizer that we can use with DDP?

vgoklani

TransformerEngine
TransformerEngine copied to clipboard

Metadata

Support single-GPU DeepSeek recipe in TE/JAX

Restructure attention C API

MuonClip for Kimi-K2 model

Pre-built wheels

FP8 attention with current scaling

Replace TE check_support with FE check_support

[Colab Py3.12 / torch 2.8.0+cu126 / CUDA 12.6] transformer_engine.pytorch .so missing in 2.5.0 wheel (blocks Evo2 1B Base)

is NVFP4 not supported for rtx 50 series?

[Common] Added an optimized gated rowwise MXFP8 SwiGLU kernel

Distributed AdamW Optimizer for DDP training?

← Metadata

Owner

Metadata

TransformerEngine TransformerEngine copied to clipboard

Metadata

← Metadata

Owner

Metadata

TransformerEngine
TransformerEngine copied to clipboard