Pinle Liu

Results 9 comments of Pinle Liu

> > This code terminates without error and the results are running well on my ASTRA-Sim. > > After discussion with the group in our last Chakra meeting, we concluded...

> Currently, there is no explicit dependency between collectives belonging to the same process group. Their sequential execution is deducted only from the duration of their parent compute node and...

> @9LLPPLL6 my understanding is that two collectives from the same PG cannot overlap and in reality, at least in nccl's case, they don't because nccl takes care of scheduling...

> I did try this for a single epoch though.我确实尝试过一次这样的尝试。 hello, Have you solved this problem now

Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!

> > Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks! > > Hello, I haven't found a good solution...

@TaekyungHeo @srinivas212 @rvinaybharadwaj @AlexandruAntonescuKeysight @JoongunPark @tushar-krishna @nathanw-mlc

> Can you check your kineto to see if you can find such cudaLaunchKernelExC with same correlation as your failing collective? > > ![Image](https://github.com/user-attachments/assets/f28af95b-b68e-4468-925a-f232f3965c76) Some of my computing kernels do...

I also encountered this problem, do you see what the problem is