TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...

Results 414 TransformerEngine issues
Sort by recently updated
recently updated
newest added
trafficstars

# Description The functionality is ready but we're not seeing perf gain due to the performance regression of fused activation and quantization kernels, take the input in shape (8*4000, 4096)...

**Is your feature request related to a problem? Please describe.** `ulysess sp + ring attention` gives a good performance in SFT/RL training, which is called `hierarchical CP` here. But it...

Hi @taesiri 🤗 I'm Niels and work as part of the open-source team at Hugging Face. I discovered your work through Hugging Face's daily papers as yours got featured: https://huggingface.co/papers/2509.25149....

megatron

Refactors the test_checkpoint.py test suite to be a bit more pytest-native and removes the need to pre-generate checkpoint files. Also adds some (currently failing) torch.dcp and huggingface checkpoint tests.

Adds some currently failing huggingface tests around safetensors and quantized_model_init

# Description Fixes a bug that causes precision issues in mix-precision training. Current implementation of copy_ method in QuantizedTensor class does not properly pass the dst.dtype information when src is...

community-contribution

# Description In Megatron-Core + Transformer Engine (TE), we quantize activations to FP8 before the MoE up-projection and then run the dispatch. This is compatible with TE’s FP8 fprop for...

# Description This pr is used to adapt for offload activation (a new feature in Megatron-LM, https://github.com/NVIDIA/Megatron-LM/pull/1752). Offload activation select inputs of specific modules (such as `core_attn`, `qkv_linear`, `router_fc1`), offloading...

megatron
community-contribution
waiting-for-feedback

# Description Fix assertion error message formatting in DotProductAttention ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content)...

community-contribution

Hello I am trying to run the latest Nvidia Cosmos model on a RTX 4090 and I get an error when fused attention is called : Line 1080 in fused_attn.py...