KungFu icon indicating copy to clipboard operation
KungFu copied to clipboard

A question about Horovod central coordinator in the paper of KungFu

Open JohanOu opened this issue 4 years ago • 2 comments

The asynchronous collective communication layer also avoids having an expensive central coordinator, as used for invoking synchronous collective communication operations inexisting systems, such as Horovod.

I see the paper of Horovod and KongFu,I wonder why does Horovod use the central coordinator,I havent find it in the paper of Horovod.Could you please give me some information about it?Such as some codes.I want to compare the difference.

Thanks!Have a nice day!

JohanOu avatar Aug 19 '21 09:08 JohanOu

is this what you are looking for https://github.com/horovod/horovod/blob/master/horovod/common/operations.cc#L359-L378

lgarithm avatar Aug 19 '21 14:08 lgarithm

is this what you are looking for https://github.com/horovod/horovod/blob/master/horovod/common/operations.cc#L359-L378

Thanks! I see the AD-PSGD algorithm in codes.Does it relate to the collective communication layer noted in the paper?

JohanOu avatar Sep 07 '21 11:09 JohanOu