mpi icon indicating copy to clipboard operation
mpi copied to clipboard

mpi::reduce hangs on intel MPI

Open francescopt opened this issue 4 years ago • 0 comments

The following program hangs on calling boost::mpi::reduce when run on an Intel MPI environment.

#include <algorithm>
#include <iostream>
#include <vector>
#include <boost/mpi/collectives.hpp>
#include <boost/mpi/operations.hpp>
#include <boost/serialization/vector.hpp>

struct sum_vec_vec {
  std::vector<double> operator()(const std::vector<double>& a, const std::vector<double>&b) const
  {
    std::vector<double> res(a.size());
    std::transform(a.begin(), a.end(), b.begin(), res.begin(), [](double x, double y) { return x + y; });
    return res;
  }
};

namespace boost {
  namespace mpi {
    template <>
    struct is_commutative<sum_vec_vec, std::vector<double>> : mpl::true_ { };
  }
}

int main()
{
  namespace mpi = boost::mpi;

  mpi::environment env;
  mpi::communicator world;

  std::size_t size = 1000;
  std::size_t L = 32;

  std::vector<std::vector<double>> correlations;

  // Fill up with some data
  for (std::size_t i = 0; i < size; ++i)
    {
      int l = 0;

      std::vector<double> corr(L);
      for (auto&x : corr)
        x = l++;
      correlations.emplace_back(std::move(corr));
    }

  std::vector<std::vector<double>> av_correlations(correlations.size());
  std::cout << "Ready for mpi::reduce" << std::endl;
  boost::mpi::reduce(world, &correlations.front(), correlations.size(), &av_correlations.front(), sum_vec_vec{}, 0);

  return 0;
}

The specific of the MPI environment are:

MPI_Get_library_version: Intel(R) MPI Library 2019 Update 6 for Linux* OS
MPI_VERSION: 3
I_MPI_NUMVERSION: 20190006300

Boost version is 1.74.0, and it defines:

BOOST_MPI_VERSION: 3
BOOST_MPI_USE_IMPROBE: 1

The program above very often hangs when run for example with >6 tasks, and always hangs when, say, Ntasks=192.

The reason apparently lies in the use of the MPI_Mprobe routines in point_to_point.cpp. On recompiling the library with the flag BOOST_MPI_USE_IMPROBE disabled in config.hpp, the program ends without issues. Also, the program runs without problems, on openmpi and enabled BOOST_MPI_USE_IMPROBE.

All in all, I suspect that the bug mentioned here is still there.

francescopt avatar Oct 09 '20 08:10 francescopt