Driss Guessous issues

Results 39 issues of


                                            Driss Guessous

Nested tensor empty like

## Summary To better enable users to write triton kernels on nested nested tensors having empty_like support is helpful. This will be used to create a nested_tensor output buffer to...

cla signed

Update shallow_copy_and_detach for nested tensor impls

# Summary This change fixes a bug that was encountered when trying to add more backward formulas for nested tensor ops. If a derivative is defined that stores the "result"...

module: autograd

module: nestedtensor

cla signed

release notes: nested tensor

Exposing native _scaled_dot_product_attention to torch.nn

# Summary This exposes the _scaled_dot_product_attention function to python in the nn namespace. It is still underscored because the api for args, and kwargs is still in flux for the...

Merged

cla signed

Reverted

ciflow/trunk

topic: not user facing

Custom SDP implementation runtime dispatch

# Summary This shows the run mechanism for dispatching to a custom backend kernel or defaulting to the composite version for SDP. Currently we short circuit for composite but in...

cla signed

Remove windows check for cmake to build Fused kernels

# Summary At the start of integration of the fused code, Windows was not supported. This PR is to remove that limitation.

with-ssh

topic: not user facing

topic: build

[SDPA] Update SDPA API and make function Public

# Summary In preparation for pt2.0 launch this PR adjusts the private functions api and makes it public.. TODO fill out the rest of the description

[SDPA] Wire up FlashAttention's backward

# Summary This PR creates _flash_attention_backward and _scaled_dot_product_flash_attention_backward native functions and registers them to the respective derivatives.yaml. The goal is to replicate the torch.autograd.Function defined in the FlashAttention repo [here](https://github.com/HazyResearch/flash-attention/blob/33e0860c9c5667fded5af674882e731909096a7f/flash_attn/flash_attn_interface.py#L126)...

Add sliding window attention bias

# Summary WIP still

Update cutlass from 3.3.0 to 3.4.1

# Summary ### Current Status - [PR](https://github.com/pytorch/pytorch/pull/118935) aims to update the flash attention kernels to a more recent version. - The update is **partially blocked** due to a dependency on...

ciflow/trunk

topic: not user facing

ciflow/rocm

Update flash_attention kernel from 2.3.6 to 2.5.5

# Summary Updates FlashAttention kernel code from tag [2.3.6](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.3.6) to [2.5.3](https://github.com/Dao-AILab/flash-attention/releases/tag/v2.5.5). The usual changes were then re-rellod on top of the modified kernel, changing how dropout saved for backward, removing...

module: cuda

ciflow/trunk

release notes: nn

skip-pr-sanity-checks