gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Problem with the performance of All_to_All communication

Open ZidongGao opened this issue 2 years ago • 1 comments

Hi, when I am using GLOO AlltoAll in my work, I find the performance is much slower than expected. Here is a test in my environment.

rank_num : 2

element_per_rank time_cost speed 50000 0.042867s 4.67MB/s 100000 0.08761s 4.57MB/s 500000 0.350037s 5.71MB/s 1000000 2.96007s 1.35MB/s 2000000 8.18468s 0.977MB/s 5000000 24.6651s 0.811MB/s

As it is shown, while the number of elements is larger than 1 million, the speed of AlltoAll becomes an obvious slowing down. Is there someone else also met this problem ? I am not sure if there is something wrong with my usage, or it should be slow as I list in AlltoAll communication.

Here is my test code of alltoall

TEST(GlooCommTest, AllToAll) {
  GlooComm gloo_comm(g_nranks, g_rank);
  gloo_comm.Initialize("127.0.0.1", 12345, "127.0.0.1");
  const size_t stride = 1000000;
  std::vector<int> send(g_nranks * stride);
  std::vector<int> recv(g_nranks * stride);
  for (size_t i = 0; i < g_nranks; ++i) {
    for (size_t j = 0; j < stride; ++j) {
      send[stride * i + j] = i;
    }
  }

  gloo_comm.AllToAll(send.data(), recv.data(), stride, 30);

  for (size_t i = 0; i < g_nranks; ++i) {
    for (size_t j = 0; j < stride; ++j) {
      ASSERT_EQ(recv[stride * i + j], static_cast<int>(g_rank));
    }
  }
}
  template <typename T>
  void AllToAll(T* send, T* recv, size_t send_cnt_each, size_t timeout) {
    gloo::AlltoallOptions opts(gloo_context_);
    opts.setInput(send, send_cnt_each * nranks_);
    opts.setOutput(recv, send_cnt_each * nranks_);
    opts.setTimeout(std::chrono::milliseconds(timeout * 1000));
    gloo::alltoall(opts);
  }

ZidongGao avatar Aug 09 '22 07:08 ZidongGao

I have the same problem. Is there any response?

gavin1332 avatar Aug 24 '22 02:08 gavin1332