Yifu Wang

Results 29 comments of Yifu Wang

Thanks for the great write up @cbalioglu! > Q6: naive comment for the writing API, I feel store might be something close to torch.device primitives, would it be possible to...

Hey @sanchitintel, any reason for closing this PR?

The PR doesn't seem mergeable anymore due to the previous abnormal revert. Re-opened in https://github.com/pytorch/pytorch/pull/121043

Hi @vmoens, thank you so much for raising the issue and drafting a proposal! Previously, in order to reduce the memory footprint on load, we would find the tensors in...

Your repro works for me with pytorch-nightly. TORCH_COMPILE_DEBUG give me this: ``` def forward(self, arg0_1: "bf16[1, 2, 2, 4]", arg1_1: "bf16[1, 5, 2, 4]", arg2_1: "bf16[1, 5, 2, 4]", arg3_1:...

@drisspg I tested on a V100. Both eager and compiled runs into the same error. I think the issue is that mem_eff_attention doesn't support bf16 on sm < 80: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF.h#L286...