pluto icon indicating copy to clipboard operation
pluto copied to clipboard

Auto loop skewing failed

Open data-panda opened this issue 6 years ago • 1 comments

I have an ICCG solver with data dependencies between iterations. For eg, for 3D matrix the i,j,k value depends on i-1, j-1, k-1, i, j, k, i+1, j+1, k+1. I used the model test case for gauss seidel example and modified it for my loop. The original loop looks like this -

#pragma scop
    for (i=1; i<=nx; i++) {
        for (j=1; j<=ny; j++) {
            for (k=1; k<=nz; k++) {
                dummy = COEFF6[i][j][k] * p_sparse_s[i][j][k];

                    if (PeriodicBoundaryX && i == 1)  dummy += COEFF0[i][j][k] * p_sparse_s[nx ][j][k];
                    else                              dummy += COEFF0[i][j][k] * p_sparse_s[i-1][j][k];

                    if (PeriodicBoundaryX && i == nx) dummy += COEFF1[i][j][k] * p_sparse_s[1  ][j][k];
                    else                              dummy += COEFF1[i][j][k] * p_sparse_s[i+1][j][k];

                    if (PeriodicBoundaryY && j == 1)  dummy += COEFF2[i][j][k] * p_sparse_s[i][ny ][k];
                    else                              dummy += COEFF2[i][j][k] * p_sparse_s[i][j-1][k];

                    if (PeriodicBoundaryY && j == ny) dummy += COEFF3[i][j][k] * p_sparse_s[i][  1][k];
                    else                              dummy += COEFF3[i][j][k] * p_sparse_s[i][j+1][k];

                    if (PeriodicBoundaryZ && k == 1)  dummy += COEFF4[i][j][k] * p_sparse_s[i][j][nz ];
                    else                              dummy += COEFF4[i][j][k] * p_sparse_s[i][j][k-1];

                    if (PeriodicBoundaryZ && k == nz) dummy += COEFF5[i][j][k] * p_sparse_s[i][j][  1];
                    else                              dummy += COEFF5[i][j][k] * p_sparse_s[i][j][k+1];

                    ap_sparse_s[i][j][k] = dummy;
                    pipi_sparse += p_sparse_s[i][j][k] * ap_sparse_s[i][j][k];
            }
        }
    }
#pragma endscop

For this pluto fails pointing to some syntax error refering to the first if statement. if I remove all if else clause (excluding the periodic boundaries in question) and write the loop as

#pragma scop
    for (i=1; i<=nx; i++) {
        for (j=1; j<=ny; j++) {
            for (k=1; k<=nz; k++) {
                ap_sparse_s[i][j][k]=  COEFF0[i][j][k] * p_sparse_s[i-1][j][k]
                                     + COEFF1[i][j][k] * p_sparse_s[i+1][j][k]
                                     + COEFF2[i][j][k] * p_sparse_s[i][j-1][k]
                                     + COEFF3[i][j][k] * p_sparse_s[i][j+1][k]
                                     + COEFF4[i][j][k] * p_sparse_s[i][j][k-1]
                                     + COEFF5[i][j][k] * p_sparse_s[i][j][k+1]
                                     + COEFF6[i][j][k] * p_sparse_s[i][j][k] ;
				pipi_sparse += p_sparse_s[i][j][k] * ap_sparse_s[i][j][k];
            }
        }
    }
#pragma endscop

Then i am able to run this with pluto but now I expect to have some red black ordering or wave transform of the loop but Pluto throws out the optimised loop as this

  int t1, t2, t3, t4;
 int lb, ub, lbp, ubp, lb2, ub2;
 register int lbv, ubv;
/* Start of CLooG code */
if ((nx >= 1) && (ny >= 1) && (nz >= 1)) {
  for (t1=1;t1<=nx;t1++) {
    for (t2=1;t2<=ny;t2++) {
      for (t3=1;t3<=nz;t3++) {
        ap_sparse_s[t1][t2][t3]= COEFF0[t1][t2][t3] * p_sparse_s[t1-1][t2][t3] + COEFF1[t1][t2][t3] * p_sparse_s[t1+1][t2][t3] + COEFF2[t1][t2][t3] * p_sparse_s[t1][t2-1][t3] + COEFF3[t1][t2][t3] * p_sparse_s[t1][t2+1][t3] + COEFF4[t1][t2][t3] * p_sparse_s[t1][t2][t3-1] + COEFF5[t1][t2][t3] * p_sparse_s[t1][t2][t3+1] + COEFF6[t1][t2][t3] * p_sparse_s[t1][t2][t3] ;;
        pipi_sparse += p_sparse_s[t1][t2][t3] * ap_sparse_s[t1][t2][t3];;
      }
    }
  }
}

The output loop is exactly the same as the input loop with the loop dependency still existing.

Could you see what is wrong ?

data-panda avatar Aug 06 '18 09:08 data-panda

Your loops only include the ones traversing the data space. You don't plan to model the time loop as well? In the code pasted above, you have a reduction here on pipi_sparse. Pluto doesn't parallelize reductions, and the dependence on that reduction scalar is what makes the whole thing sequential here.

bondhugula avatar Aug 11 '18 15:08 bondhugula