TransformerEngine issues

Context Parallel integration tests with a transformer layer: BSHD and THD + CP

# Description This PR creates the following folder ``` TransformerEngine/examples/pytorch/transformer: ├── context_parallel_runner_bshd.py ├── context_parallel_runner_thd.py ├── model.py ├── __pycache__ ├── README.md ├── run_context_parallel.sh ├── test_context_parallel_bshd.py ├── test_context_parallel_thd.py └── utils.py ``` That...

jomitchellnv

2.10.0

[Common] Persistent MXFP8 kernel

# Description This PR enables persistency of the MXFP8 cast kernel using WorkID Query feature on Blackwell (sm100a). Fixes # (issue) ## Type of change - [ ] Documentation change...

Oleg-Goncharov

[Pytorch] CP + THD + chunked attention support.

3

# Description This PR introduces support for CP + THD + chunked attention Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation,...

pggPL

2.10.0

Support MLA Context Parallel (CP) exchanging latent KV

1

# Description Support MLA CP exchanging latent KV (instead of the complete KV) for ring attention. Fixes # (issue) ## Type of change - [ ] Documentation change (change only...

yuzhongw-nvidia

Fix runtime lib loading logic

1

# Description This is a small refactor of library loading logic during runtime to be more consistent and avoid duplication. The main point is to check python packages as a...

ksivaman

[PyTorch Debug] NVFP4 debug stats support

1

# Description This PR adds support for NVFP4 statistics: underflows and mse. I add them in seperate feature, because we may want to have a lot nvfp4-specific features added later....

pggPL

[PyTorch] CPU Overhead Micro-optimizations

4

# Description Motivation: https://github.com/NVIDIA/TransformerEngine/issues/2053 Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the documentation, either a fix or a new content) - [...

zhongbozhu

Document Supported Datatypes Per Compute Capability

I've found that the latest docker images (and presumably this repo broadly) do not support RTX Pro 6000 (SM120) for MXFP8 (see below error). I've been unable to find any...

JamesDConley

[PyTorch Debug] Custom feature tutorial.

1

# Description This PR adds short custom feature tutorial to precision debug tools docs. ## Type of change - [x] Documentation change (change only to the documentation, either a fix...

pggPL

[Common][PyTorch][Rework] PDL for Quantization

14

# Description Rework PDL for quantization in #2001 and #2066. Add two quantization configs - `pdl_sync`: Add `cudaGridDependencySynchronize` to the first quantization kernel, to make sure the previous unknown kernel...

yaox12

TransformerEngine
TransformerEngine copied to clipboard

Metadata

Context Parallel integration tests with a transformer layer: BSHD and THD + CP

[Common] Persistent MXFP8 kernel

[Pytorch] CP + THD + chunked attention support.

Support MLA Context Parallel (CP) exchanging latent KV

Fix runtime lib loading logic

[PyTorch Debug] NVFP4 debug stats support

[PyTorch] CPU Overhead Micro-optimizations

Document Supported Datatypes Per Compute Capability

[PyTorch Debug] Custom feature tutorial.

[Common][PyTorch][Rework] PDL for Quantization

← Metadata

Owner

Metadata

TransformerEngine TransformerEngine copied to clipboard

Metadata

← Metadata

Owner

Metadata

TransformerEngine
TransformerEngine copied to clipboard