TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...

Results 414 TransformerEngine issues
Sort by recently updated
recently updated
newest added

# Description This PR creates the following folder ``` TransformerEngine/examples/pytorch/transformer: ├── context_parallel_runner_bshd.py ├── context_parallel_runner_thd.py ├── model.py ├── __pycache__ ├── README.md ├── run_context_parallel.sh ├── test_context_parallel_bshd.py ├── test_context_parallel_thd.py └── utils.py ``` That...

2.10.0

# Description This PR enables persistency of the MXFP8 cast kernel using WorkID Query feature on Blackwell (sm100a). Fixes # (issue) ## Type of change - [ ] Documentation change...

# Description This PR introduces support for CP + THD + chunked attention Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation,...

2.10.0

# Description Support MLA CP exchanging latent KV (instead of the complete KV) for ring attention. Fixes # (issue) ## Type of change - [ ] Documentation change (change only...

# Description This is a small refactor of library loading logic during runtime to be more consistent and avoid duplication. The main point is to check python packages as a...

# Description This PR adds support for NVFP4 statistics: underflows and mse. I add them in seperate feature, because we may want to have a lot nvfp4-specific features added later....

# Description Motivation: https://github.com/NVIDIA/TransformerEngine/issues/2053 Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [...

I've found that the latest docker images (and presumably this repo broadly) do not support RTX Pro 6000 (SM120) for MXFP8 (see below error). I've been unable to find any...

# Description This PR adds short custom feature tutorial to precision debug tools docs. ## Type of change - [x] Documentation change (change only to the documentation, either a fix...

# Description Rework PDL for quantization in #2001 and #2066. Add two quantization configs - `pdl_sync`: Add `cudaGridDependencySynchronize` to the first quantization kernel, to make sure the previous unknown kernel...