scalapack icon indicating copy to clipboard operation
scalapack copied to clipboard

Wrong output for `igebs2d`, `igebr2d` (matrix broadcast) in ScaLAPACK

Open j7168908jx opened this issue 1 year ago • 0 comments

I am trying to write some C++ code that calls ScaLAPACK and I encountered this problem.

After extracting the problem to a minimum example, I want to broadcast a general matrix, which here is a special case, that is 1x1, to a process grid, applying in each row of this grid.

For example, the grid is 2 rows x 1 cols, and process 1 wants to broadcast a value to process 0.

And here I have this minimum ~50 lines example showing this weird result:

// build with: (but version does not matter I think)
// mpicxx --std=c++17 test.cpp -L/opt/scalapack-2.2.0 -L/opt/LAPACK/3.10.0/ -L/opt/OpenBLAS/0.3.19/lib64 -lscalapack -llapack -lopenblas -lgfortran -o test.x
// run with:
// OMP_NUM_THREADS=1 LD_LIBRARY_PATH=/opt/LAPACK/3.10.0:/opt/OpenBLAS/0.3.19/lib64:/opt/MPICH/4.0.2/lib:$LD_LIBRARY_PATH mpirun -np 2 ./test.x


#include <cstdlib>
#include <cassert>
#include <string>
#include <iostream>
#include <fstream>
#include <iomanip>
#include <vector>
#include "mpi.h"
#include <Eigen/Dense>
#include <thread>
#include <chrono>


extern "C" {

  void Cblacs_get(const int ictxt, const int what, int *val);
  void Cblacs_pinfo(int *myrank, int *nprocs);
  void Cblacs_gridinit(int *ictxt, const char *order, const int nprow, const int npcol);
  void Cblacs_gridinfo(const int ictxt, int *nprow, int *npcol, int *myrow, int *mycol);
  void Cblacs_gridexit(const int ictxt);
  void descinit_(int *desc,
      const int *m, const int *n, const int *mb, const int *nb, const int *irsrc, const int *icsrc, const int *ictxt, const int *lld, int *info);

  void Cigebs2d(
    const int ConTxt, const char *scope, const char *top,
    const int m, const int n, const int *A, const int lda);
  void Cigebr2d(
    const int ConTxt, const char *scope, const char *top,
    const int m, const int n, int *A, const int lda,
    const int rsrc, const int csrc);


  // compute LOCr or LOCc (local size of data for distributed array)
  int numroc_(const int *n, const int *nb, const int *iproc, const int *isrcproc, const int *nprocs);

}

int main(int argc, char **argv) {
  MPI_Init(&argc, &argv);

  int myid, numprocs;
  int ictxt, myrow, mycol;
  int nprow = 2, npcol = 1;
  int magic = 4;
  Cblacs_pinfo(&myid, &numprocs);
  Cblacs_get(0, 0, &ictxt);
  Cblacs_gridinit(&ictxt, "Row", nprow, npcol);
  Cblacs_gridinfo(ictxt, &nprow, &npcol, &myrow, &mycol);
  std::this_thread::sleep_for(std::chrono::seconds(magic-myid));
  std::cout << "[" << myid << "] :" << "ictxt: " << ictxt << std::endl;
  std::cout << "[" << myid << "] :" << "nprow and npcol: " << nprow << " " << npcol << std::endl;
  std::cout << "[" << myid << "] :" << "myrow and mycol: " << myrow << " " << mycol << std::endl;

  char charc = 'c', chars = ' ';

  int vcurrow = 1;
  int sendv, recvv;
  if (myid == vcurrow) {
    sendv = 2;
    Cigebs2d(ictxt, &charc, &chars, 1, 1, &sendv, 1);
    std::cout << "[" << myid << "] :" << "sendv: " << sendv << std::endl;

  } else {
    Cigebr2d(ictxt, &charc, &chars, 1, 1, &recvv, 1, mycol, vcurrow);
    std::cout << "[" << myid << "] :" << "recvv: " << recvv << std::endl;

  }

  Cblacs_gridexit(ictxt);
  MPI_Finalize();
  return 0;
}

The output is


[1] :ictxt: 0
[1] :nprow and npcol: 2 1
[1] :myrow and mycol: 1 0
[0] :ictxt: 0
[0] :nprow and npcol: 2 1
[0] :myrow and mycol: 0 0
[0] :recvv: 4
[1] :sendv: 4

which totally confuses me. And also I found that the output value is actually, the value in the magic variable. Why?

(The reason I use the sleep code is to make the output more clear)

Also, I tried to replace igebr2d, igebs2d with MPI_Bcast. That works well (only tested in this example)

j7168908jx avatar May 21 '24 02:05 j7168908jx