Driss Guessous
Driss Guessous
## Summary To better enable users to write triton kernels on nested nested tensors having empty_like support is helpful. This will be used to create a nested_tensor output buffer to...
# Summary This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops. If a derivative is defined that stores the "result"...
# Summary This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the...
# Summary This shows the run mechanism for dispatching to a custom backend kernel or defaulting to the composite version for SDP. Currently we short circuit for composite but in...
# Summary At the start of integration of the fused code, Windows was not supported. This PR is to remove that limitation.
# Summary In preparation for pt2.0 launch this PR adjusts the private functions api and makes it public.. TODO fill out the rest of the description
# Summary This PR creates _flash_attention_backward and _scaled_dot_product_flash_attention_backward native functions and registers them to the respective derivatives.yaml. The goal is to replicate the torch.autograd.Function defined in the FlashAttention repo [here](https://github.com/HazyResearch/flash-attention/blob/33e0860c9c5667fded5af674882e731909096a7f/flash_attn/flash_attn_interface.py#L126)...
# Summary WIP still
# Summary ### Current Status - [PR](https://github.com/pytorch/pytorch/pull/118935) aims to update the flash attention kernels to a more recent version. - The update is **partially blocked** due to a dependency on...
# Summary Updates FlashAttention kernel code from tag [2.3.6](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.3.6) to [2.5.3](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.5.5). The usual changes were then re-rellod on top of the modified kernel, changing how dropout saved for backward, removing...