Driss Guessous

Results 39 issues of Driss Guessous

## Summary To better enable users to write triton kernels on nested nested tensors having empty_like support is helpful. This will be used to create a nested_tensor output buffer to...

cla signed

# Summary This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops. If a derivative is defined that stores the "result"...

module: autograd
module: nestedtensor
cla signed
release notes: nested tensor

# Summary This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the...

Merged
cla signed
Reverted
ciflow/trunk
topic: not user facing

# Summary This shows the run mechanism for dispatching to a custom backend kernel or defaulting to the composite version for SDP. Currently we short circuit for composite but in...

cla signed

# Summary At the start of integration of the fused code, Windows was not supported. This PR is to remove that limitation.

with-ssh
topic: not user facing
topic: build

# Summary In preparation for pt2.0 launch this PR adjusts the private functions api and makes it public.. TODO fill out the rest of the description

# Summary This PR creates _flash_attention_backward and _scaled_dot_product_flash_attention_backward native functions and registers them to the respective derivatives.yaml. The goal is to replicate the torch.autograd.Function defined in the FlashAttention repo [here](https://github.com/HazyResearch/flash-attention/blob/33e0860c9c5667fded5af674882e731909096a7f/flash_attn/flash_attn_interface.py#L126)...

# Summary ### Current Status - [PR](https://github.com/pytorch/pytorch/pull/118935) aims to update the flash attention kernels to a more recent version. - The update is **partially blocked** due to a dependency on...

ciflow/trunk
topic: not user facing
ciflow/rocm

# Summary Updates FlashAttention kernel code from tag [2.3.6](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.3.6) to [2.5.3](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.5.5). The usual changes were then re-rellod on top of the modified kernel, changing how dropout saved for backward, removing...

module: cuda
ciflow/trunk
release notes: nn
skip-pr-sanity-checks