pluto
pluto copied to clipboard
Auto loop skewing failed
I have an ICCG solver with data dependencies between iterations. For eg, for 3D matrix the i,j,k value depends on i-1, j-1, k-1, i, j, k, i+1, j+1, k+1
. I used the model test case for gauss seidel example and modified it for my loop. The original loop looks like this -
#pragma scop
for (i=1; i<=nx; i++) {
for (j=1; j<=ny; j++) {
for (k=1; k<=nz; k++) {
dummy = COEFF6[i][j][k] * p_sparse_s[i][j][k];
if (PeriodicBoundaryX && i == 1) dummy += COEFF0[i][j][k] * p_sparse_s[nx ][j][k];
else dummy += COEFF0[i][j][k] * p_sparse_s[i-1][j][k];
if (PeriodicBoundaryX && i == nx) dummy += COEFF1[i][j][k] * p_sparse_s[1 ][j][k];
else dummy += COEFF1[i][j][k] * p_sparse_s[i+1][j][k];
if (PeriodicBoundaryY && j == 1) dummy += COEFF2[i][j][k] * p_sparse_s[i][ny ][k];
else dummy += COEFF2[i][j][k] * p_sparse_s[i][j-1][k];
if (PeriodicBoundaryY && j == ny) dummy += COEFF3[i][j][k] * p_sparse_s[i][ 1][k];
else dummy += COEFF3[i][j][k] * p_sparse_s[i][j+1][k];
if (PeriodicBoundaryZ && k == 1) dummy += COEFF4[i][j][k] * p_sparse_s[i][j][nz ];
else dummy += COEFF4[i][j][k] * p_sparse_s[i][j][k-1];
if (PeriodicBoundaryZ && k == nz) dummy += COEFF5[i][j][k] * p_sparse_s[i][j][ 1];
else dummy += COEFF5[i][j][k] * p_sparse_s[i][j][k+1];
ap_sparse_s[i][j][k] = dummy;
pipi_sparse += p_sparse_s[i][j][k] * ap_sparse_s[i][j][k];
}
}
}
#pragma endscop
For this pluto fails pointing to some syntax error refering to the first if statement. if I remove all if else clause (excluding the periodic boundaries in question) and write the loop as
#pragma scop
for (i=1; i<=nx; i++) {
for (j=1; j<=ny; j++) {
for (k=1; k<=nz; k++) {
ap_sparse_s[i][j][k]= COEFF0[i][j][k] * p_sparse_s[i-1][j][k]
+ COEFF1[i][j][k] * p_sparse_s[i+1][j][k]
+ COEFF2[i][j][k] * p_sparse_s[i][j-1][k]
+ COEFF3[i][j][k] * p_sparse_s[i][j+1][k]
+ COEFF4[i][j][k] * p_sparse_s[i][j][k-1]
+ COEFF5[i][j][k] * p_sparse_s[i][j][k+1]
+ COEFF6[i][j][k] * p_sparse_s[i][j][k] ;
pipi_sparse += p_sparse_s[i][j][k] * ap_sparse_s[i][j][k];
}
}
}
#pragma endscop
Then i am able to run this with pluto but now I expect to have some red black ordering or wave transform of the loop but Pluto throws out the optimised loop as this
int t1, t2, t3, t4;
int lb, ub, lbp, ubp, lb2, ub2;
register int lbv, ubv;
/* Start of CLooG code */
if ((nx >= 1) && (ny >= 1) && (nz >= 1)) {
for (t1=1;t1<=nx;t1++) {
for (t2=1;t2<=ny;t2++) {
for (t3=1;t3<=nz;t3++) {
ap_sparse_s[t1][t2][t3]= COEFF0[t1][t2][t3] * p_sparse_s[t1-1][t2][t3] + COEFF1[t1][t2][t3] * p_sparse_s[t1+1][t2][t3] + COEFF2[t1][t2][t3] * p_sparse_s[t1][t2-1][t3] + COEFF3[t1][t2][t3] * p_sparse_s[t1][t2+1][t3] + COEFF4[t1][t2][t3] * p_sparse_s[t1][t2][t3-1] + COEFF5[t1][t2][t3] * p_sparse_s[t1][t2][t3+1] + COEFF6[t1][t2][t3] * p_sparse_s[t1][t2][t3] ;;
pipi_sparse += p_sparse_s[t1][t2][t3] * ap_sparse_s[t1][t2][t3];;
}
}
}
}
The output loop is exactly the same as the input loop with the loop dependency still existing.
Could you see what is wrong ?
Your loops only include the ones traversing the data space. You don't plan to model the time loop as well? In the code pasted above, you have a reduction here on pipi_sparse. Pluto doesn't parallelize reductions, and the dependence on that reduction scalar is what makes the whole thing sequential here.