Yifu Wang comments

Results 29 comments of


                                            Yifu Wang

[RFC] TorchStore - A Shared-Memory Tensor Store

Thanks for the great write up @cbalioglu! > Q6: naive comment for the writing API, I feel store might be something close to torch.device primitives, would it be possible to...

[dynamo] support group=None when rewriting collectives

@pytorchbot merge

[dynamo] support group=None when rewriting collectives

@pytorchbot merge

[dynamo] support group=None when rewriting collectives

Hey @sanchitintel, any reason for closing this PR?

[dynamo] support group=None when rewriting collectives

@pytorchbot merge

[dynamo] support group=None when rewriting collectives

@pytorchbot merge

[dynamo] support group=None when rewriting collectives

The PR doesn't seem mergeable anymore due to the previous abnormal revert. Re-opened in https://github.com/pytorch/pytorch/pull/121043

[Feature] Load partially instantiated state-dict

Hi @vmoens, thank you so much for raising the issue and drafting a proposal! Previously, in order to reduce the memory footprint on load, we would find the tensors in...

torch.compile() with flash decoding ops

Your repro works for me with pytorch-nightly. TORCH_COMPILE_DEBUG give me this: ``` def forward(self, arg0_1: "bf16[1, 2, 2, 4]", arg1_1: "bf16[1, 5, 2, 4]", arg2_1: "bf16[1, 5, 2, 4]", arg3_1:...

RuntimeError: cutlassF: no kernel found to launch!

@drisspg I tested on a V100. Both eager and compiled runs into the same error. I think the issue is that mem_eff_attention doesn't support bf16 on sm < 80: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassF.h#L286...