Pinle Liu
Pinle Liu
> > This code terminates without error and the results are running well on my ASTRA-Sim. > > After discussion with the group in our last Chakra meeting, we concluded...
> Currently, there is no explicit dependency between collectives belonging to the same process group. Their sequential execution is deducted only from the duration of their parent compute node and...
> @9LLPPLL6 my understanding is that two collectives from the same PG cannot overlap and in reality, at least in nccl's case, they don't because nccl takes care of scheduling...
> I did try this for a single epoch though.我确实尝试过一次这样的尝试。 hello, Have you solved this problem now
Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks!
> > Hello, I encountered a similar problem ,May I know your specific problem and how did you solve it? Thanks! > > Hello, I haven't found a good solution...
@TaekyungHeo @srinivas212 @rvinaybharadwaj @AlexandruAntonescuKeysight @JoongunPark @tushar-krishna @nathanw-mlc
> Can you check your kineto to see if you can find such cudaLaunchKernelExC with same correlation as your failing collective? > >  Some of my computing kernels do...
I also encountered this problem, do you see what the problem is