Theodor Badea

Results 15 comments of Theodor Badea

Hey, @JoongunPark , thanks for sharing this. As far as I can see, @briancoutinho did not align, i.e. increment, the external id, but added record function id to the p...

Hey, @JoongunPark . Yep, I think in order to have this fixed and also have a robust linkage, PyTorch needs to also dump the rf id for kernel nodes. I...

Hey, @sunboyZgz . Can you share your code use to capture the traces? The profiler part. Would be interesting to see how you ended up having only cpu nodes.

@32HD can you please try https://github.com/mlcommons/chakra/pull/190 ? It may be related.

Can you check your kineto to see if you can find such cudaLaunchKernelExC with same correlation as your failing collective? ![Image](https://github.com/user-attachments/assets/f28af95b-b68e-4468-925a-f232f3965c76)