Tim Moon comments

Results 227 comments of


                                            Tim Moon

Compiling on Slurmcluster fatal error: cudnn.h: No such file or directory

It looks like PyTorch's C++ extensions are configured with `CUDNN_HOME` or `CUDNN_PATH`: https://github.com/pytorch/pytorch/blob/5a80d2df844f9794b3b7ad91eddc7ba762760ad0/torch/utils/cpp_extension.py#L209 PyTorch's build is configured with `CUDNN_ROOT`: https://github.com/pytorch/pytorch/blob/5a80d2df844f9794b3b7ad91eddc7ba762760ad0/cmake/Modules_CUDA_fix/FindCUDNN.cmake#L4

Compiling on Slurmcluster fatal error: cudnn.h: No such file or directory

```bash export CUDNN_PATH=/path/to/cudnn pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable ```

qwen1.5-0.5B failed to save model with huggingface transformers

This bug should be fixed with https://github.com/NVIDIA/TransformerEngine/pull/1335, which is included in Transformer Engine 2.0.

[PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling

/te-ci L1

[PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling

/te-ci L1

[PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling

/te-ci L1 pytorch

[PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling

/te-ci pytorch L1

[PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling

/te-ci pytorch L1

How about the torch.compile in TransformerEngine ?

We have used `torch.compile` to fuse some operations like bias+GeLU in `LayerNormMLP` (see [`bias_gelu_fused_`](https://github.com/NVIDIA/TransformerEngine/blob/b36bd0a458424eac939669ae05231726b3461b0d/transformer_engine/pytorch/jit.py#L60)). However, we have not yet done serious work applying `torch.compile` to FP8 kernels since we're not...

FP8 for norm inputs and residuals?

Matmuls are ideal for FP8 compute since they can take advantage of Tensor Cores and they're less sensitive to quantization error. While other operations might benefit (especially from reduced memory...