gloo
gloo copied to clipboard
Problem with the performance of All_to_All communication
Hi, when I am using GLOO AlltoAll in my work, I find the performance is much slower than expected. Here is a test in my environment.
rank_num : 2
element_per_rank time_cost speed 50000 0.042867s 4.67MB/s 100000 0.08761s 4.57MB/s 500000 0.350037s 5.71MB/s 1000000 2.96007s 1.35MB/s 2000000 8.18468s 0.977MB/s 5000000 24.6651s 0.811MB/s
As it is shown, while the number of elements is larger than 1 million, the speed of AlltoAll becomes an obvious slowing down. Is there someone else also met this problem ? I am not sure if there is something wrong with my usage, or it should be slow as I list in AlltoAll communication.
Here is my test code of alltoall
TEST(GlooCommTest, AllToAll) {
GlooComm gloo_comm(g_nranks, g_rank);
gloo_comm.Initialize("127.0.0.1", 12345, "127.0.0.1");
const size_t stride = 1000000;
std::vector<int> send(g_nranks * stride);
std::vector<int> recv(g_nranks * stride);
for (size_t i = 0; i < g_nranks; ++i) {
for (size_t j = 0; j < stride; ++j) {
send[stride * i + j] = i;
}
}
gloo_comm.AllToAll(send.data(), recv.data(), stride, 30);
for (size_t i = 0; i < g_nranks; ++i) {
for (size_t j = 0; j < stride; ++j) {
ASSERT_EQ(recv[stride * i + j], static_cast<int>(g_rank));
}
}
}
template <typename T>
void AllToAll(T* send, T* recv, size_t send_cnt_each, size_t timeout) {
gloo::AlltoallOptions opts(gloo_context_);
opts.setInput(send, send_cnt_each * nranks_);
opts.setOutput(recv, send_cnt_each * nranks_);
opts.setTimeout(std::chrono::milliseconds(timeout * 1000));
gloo::alltoall(opts);
}
I have the same problem. Is there any response?