TransformerEngine
TransformerEngine copied to clipboard
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization i...
Potential Performance Regression in v2.9.0 due to CUTLASS kernels GroupedGemm on large token sizes
**Describe the bug** A clear and concise description of what the bug is. More details in the original PyTorch issue: https://github.com/pytorch/pytorch/issues/163425 **Steps/Code to reproduce bug** Please list *minimal* steps or...
# Description Please include a brief summary of the changes, relevant motivation and context. Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the...
# Description Please include a brief summary of the changes, relevant motivation and context. Fixes # (issue) ## Type of change - [ ] Documentation change (change only to the...
# Description Replace the `tex.dequantize` with a torch implementation fixes the issue for now. Need to root cause why `tex.dequantize` did not work for the async cp, but working for...