Yifu Wang
Yifu Wang
Thanks for the great write up @cbalioglu! > Q6: naive comment for the writing API, I feel store might be something close to torch.device primitives, would it be possible to...
@pytorchbot merge
@pytorchbot merge
Hey @sanchitintel, any reason for closing this PR?
@pytorchbot merge
@pytorchbot merge
The PR doesn't seem mergeable anymore due to the previous abnormal revert. Re-opened in https://github.com/pytorch/pytorch/pull/121043
Hi @vmoens, thank you so much for raising the issue and drafting a proposal! Previously, in order to reduce the memory footprint on load, we would find the tensors in...
Your repro works for me with pytorch-nightly. TORCH_COMPILE_DEBUG give me this: ``` def forward(self, arg0_1: "bf16[1, 2, 2, 4]", arg1_1: "bf16[1, 5, 2, 4]", arg2_1: "bf16[1, 5, 2, 4]", arg3_1:...
@drisspg I tested on a V100. Both eager and compiled runs into the same error. I think the issue is that mem_eff_attention doesn't support bf16 on sm < 80: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF.h#L286...