Driss Guessous
Driss Guessous
cc @andrewor14 this is the theme update I was talking about
@andrewor14 Just rebased, lets see if docs populate
@pytorchbot merge
Unfortunately the Float8InferenceLinear is being developed against the latest pytorch nightly and is not very tested on older versions of PyTorch. If it is possible for you to update your...
https://github.com/pytorch/torchtitan/pull/1208
@eqy does CuDNN jit compile for every updated sequence length? That seems non ideal
Started to work on the pre-reqs: https://github.com/pytorch/pytorch/pull/143515 But yeah as of right now the most performant kernel we have in PyTorch is the CUDNN backend on h100
SDPBackend. CUDNN_ATTENTION is the fastest implementation currently supported for SDPA and is meant for h100 + gpus. For A100 and A10s FAv2 is still your best bet >is much faster...
Looks good, I also hope that this is pretty small PR since we had this enabled previously in fp8 experimental
This linter seems to be unaware of fstrings: 