pytorch-image-models [FEATURE] Use of Nvidia Transformer Engine

[FEATURE] Use of Nvidia Transformer Engine

Open yazdanimehdi opened this issue 1 year ago • 1 comments

Using https://github.com/NVIDIA/TransformerEngine to speed up transformer based models on new Nvidia Hopper GPUs and float8 training.  Ideally it will detect that you are using Ada based GPUs and adapt the transformer engine 

Jun 21 '23 02:06 yazdanimehdi

@yazdanimehdi finally got around to picking up a 4090. It's nice and decent boost when using torchcompile.

I tried fiddling with Transformer Engine and FP8 autocast and it wasn't very helpful. I feel it needs re-writing the models to use fused layers and fully integrate the attention. Just doing the 'easy' bits such as converting nn.Linear and nn.LayerNorm and using te.autocast is slower than using torch native AMP w/ F.sdpa + bfloat16. The te the attention won't be using a fast kernel, and some of the matmuls won't be cast to lower precision, cannot combine torch autocast w/ te autocast it seems.

So, until torch decides to include some ada/hopper compatible FP8 support & casting + optimized kernels for e.g F.sdpa, I don't think there is much point, I am not going to maintain multiple versions of various blocks / models, etc with TE vs not.

Aug 28 '23 21:08 rwightman

pytorch-image-models pytorch-image-models copied to clipboard

[FEATURE] Use of Nvidia Transformer Engine

pytorch-image-models
pytorch-image-models copied to clipboard