Theodor Badea
Theodor Badea
https://github.com/mlcommons/chakra/pull/190
@jinsun-yoo thanks for raising this, yesterday I struggled a lot with this issue and I really did not know if it's something on my end or not. As you have...
@jinsun-yoo I noticed you've addressed the protobuf version in PR#202. Can this issue be closed?
Hey @JoongunPark , thanks for your feedback on this PR. That check is redundant because child nodes are only added to the stack if they are not in the visited...
@9LLPPLL6 my understanding is that two collectives from the same PG cannot overlap and in reality, at least in nccl's case, they don't because nccl takes care of scheduling them....
Hey @spandoescode , I put “draft” in the first comment because this PR contains just some commented out code and I would like to use it just to support my...
Hey, @jinsun-yoo. Indeed, since the parent is missing, it will not depend on any compute node. This would be basically a reason why I initiated this discussion, to get feedback...
@jinsun-yoo I managed to reproduce the situation you were targeting with your comment when training AlexNet with FSDP. This is what the original .et looks like:  This is what...
Hey, @JoongunPark . After some more digging, I found out that what's captured in the resulting chakra.et corresponds to thread tid=2 and what's missing to thread tid=1. About your question,...
possibly related with https://github.com/mlcommons/chakra/issues/186