llvm-project
llvm-project copied to clipboard
Implement reductions in parallel loops (easy starter)
| Bugzilla Link | 52312 |
| Version | unspecified |
| OS | Linux |
| CC | @joker-eph |
Extended Description
Support parallel reductions. Currently only vector loops support reductions, but forall loops (viz. scf::ParallelOp ) can deal with reductions too.
Relevant entry point: isParallelFor() method in https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
assigned to @aartbik
I am interested in working on this issue.
After having a look at the isParallelFor() method, I think we can enable parallel reduction by simply removing !isReduction from return statements inside each switch case. We should also modify this condition to permit parallelization of sparse output using reduction.
All other things will probably be taken care of by the existing infrastructure.
To summarise, I am proposing the following isParallelFor() method :
static bool isParallelFor(CodeGen &codegen, bool isOuter, bool isReduction,
bool isSparse, bool isVector) {
if (codegen.sparseOut && !isReduction)
return false;
switch (codegen.options.parallelizationStrategy) {
case SparseParallelizationStrategy::kNone:
return false;
case SparseParallelizationStrategy::kDenseOuterLoop:
return isOuter && !isSparse && !isVector;
case SparseParallelizationStrategy::kAnyStorageOuterLoop:
return isOuter && !isVector;
case SparseParallelizationStrategy::kDenseAnyLoop:
return !isSparse && !isVector;
case SparseParallelizationStrategy::kAnyStorageAnyLoop:
return !isVector;
}
llvm_unreachable("unexpected parallelization strategy");
}
Am I on the right track? @aartbik
On a separate note, what is the preferred way of checking the parallel execution strategy of generated for loop (so that I can verify if my solution works)?
I am hesitant on checking the manually generated IR since I am not much familiar with OpenMP dialect.
Removing these flags is of course the first step, but the existing infrastructure will not just simply take care of the rest, since you will end up with a parallel loop construct that has a loop-carried dependence. Please have a look at the ParallelOp in the SCF dialect, in particular the scf.reduce/return constructs to see what else is required.
I can try to take a look at it.
Checkout https://reviews.llvm.org/D135927
Completed by Peiming.