Ke Wen

Results 65 comments of Ke Wen

> For all the other cases, numel() < nranks case is handled in here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/placement_types.py#L59-L61 Not sure how the pointed assert handles the case -- it only checks `sharder.dim` against...

Thanks, we will improve the document.

The reason for the hang is complicated and yes, it is related to the code you refer to (guessing device). There are two ways to workaround it: 1. Pass a...

Good idea. Will doc it at the `pipeline` API level (unflattener is private).

Thanks @lessw2020 . Do you think the IMA relates to the triton kernel? Can you help fix it? PP needs this fix to land. Would appreciate your help.

Thanks @lessw2020 for the demonstration. > some kind of bug between triton load masking and what is going awry when run as a custom op Can you point me to...

I'd vote for deprecating the tutorial as nobody maintains the software or the tutorial now

I believe in old FSDP, where FSDP API is called on the whole model, `reshard_after_forward` can be automatically figured out (or at least there is a way to do so)....