Driss Guessous comments

Results 119 comments of


                                            Driss Guessous

Update shallow_copy_and_detach for nested tensor impls

@pytorchbot merge -l

The memory usage piles up over the time and leads to OOM

@prav2019 So what exactly solved your OOM issue, was it setting the max-requests?

Custom SDP implementation runtime dispatch

Notes: - Did not implement context manager versions of use_fused and use_math - Not tested (YAY!). Not totally true since we have test_transformers that exercised fused_sdp function. However we do...

Custom SDP implementation runtime dispatch

This branch has been corrupted. Lets try this again: #85880

Use fallback approach for nested matmul

fixes: #85064

Exposing native _scaled_dot_product_attention to torch.nn

@cpuhrsch the one user of _scaled_dot_product_attention is multiheaded_attention_forward and the change broke with: ``` File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 5170, in multi_head_attention_forward attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p) RuntimeError: _scaled_dot_product_attention:...