TransformerEngine issues

Potential Performance Regression in v2.9.0 due to CUTLASS kernels GroupedGemm on large token sizes

2

**Describe the bug** A clear and concise description of what the bug is. More details in the original PyTorch issue: https://github.com/pytorch/pytorch/issues/163425 **Steps/Code to reproduce bug** Please list *minimal* steps or...

Skylion007

bug

[JAX] Improve CollectiveGEMM examples

# Description Please include a brief summary of the changes, relevant motivation and context. Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the...

phu0ngng

[JAX] xla_home logging during JAX build

# Description Please include a brief summary of the changes, relevant motivation and context. Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the...

jberchtold-nvidia

[PyTorch][Mcore] Fix illegal memory access issue while using Mcore async checkpoint with fp8 tensorwise recipe

1

# Description Replace the `tex.dequantize` with a torch implementation fixes the issue for now. Need to root cause why `tex.dequantize` did not work for the async cp, but working for...

zhongbozhu

bug

TransformerEngine
TransformerEngine copied to clipboard

Metadata

Potential Performance Regression in v2.9.0 due to CUTLASS kernels GroupedGemm on large token sizes

[JAX] Improve CollectiveGEMM examples

[JAX] xla_home logging during JAX build

[PyTorch][Mcore] Fix illegal memory access issue while using Mcore async checkpoint with fp8 tensorwise recipe

← Metadata

Owner

Metadata

TransformerEngine TransformerEngine copied to clipboard

Metadata

Potential Performance Regression in v2.9.0 due to CUTLASS kernels GroupedGemm on large token sizes

[JAX] Improve CollectiveGEMM examples

[JAX] xla_home logging during JAX build

[PyTorch][Mcore] Fix illegal memory access issue while using Mcore async checkpoint with fp8 tensorwise recipe

← Metadata

Owner

Metadata

TransformerEngine
TransformerEngine copied to clipboard