llvm-project Implement reductions in parallel loops (easy starter)

Bugzilla Link 52312

Version unspecified

OS Linux

CC @joker-eph


Bugzilla Link	52312
Version	unspecified
OS	Linux
CC	@joker-eph

Extended Description

Support parallel reductions. Currently only vector loops support reductions, but forall loops (viz. scf::ParallelOp ) can deal with reductions too.

Relevant entry point: isParallelFor() method in https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

Oct 26 '21 00:10 aartbik

assigned to @aartbik

Oct 26 '21 00:10 aartbik

I am interested in working on this issue. After having a look at the isParallelFor() method, I think we can enable parallel reduction by simply removing !isReduction from return statements inside each switch case. We should also modify this condition to permit parallelization of sparse output using reduction. All other things will probably be taken care of by the existing infrastructure.

To summarise, I am proposing the following isParallelFor() method :

static bool isParallelFor(CodeGen &codegen, bool isOuter, bool isReduction,
                          bool isSparse, bool isVector) {
  if (codegen.sparseOut && !isReduction)
    return false;
  switch (codegen.options.parallelizationStrategy) {
  case SparseParallelizationStrategy::kNone:
    return false;
  case SparseParallelizationStrategy::kDenseOuterLoop:
    return isOuter && !isSparse && !isVector;
  case SparseParallelizationStrategy::kAnyStorageOuterLoop:
    return isOuter && !isVector;
  case SparseParallelizationStrategy::kDenseAnyLoop:
    return !isSparse && !isVector;
  case SparseParallelizationStrategy::kAnyStorageAnyLoop:
    return !isVector;
  }
  llvm_unreachable("unexpected parallelization strategy");
}

Am I on the right track? @aartbik

On a separate note, what is the preferred way of checking the parallel execution strategy of generated for loop (so that I can verify if my solution works)? I am hesitant on checking the manually generated IR since I am not much familiar with OpenMP dialect.

Mar 10 '22 21:03 meshtag

Removing these flags is of course the first step, but the existing infrastructure will not just simply take care of the rest, since you will end up with a parallel loop construct that has a loop-carried dependence. Please have a look at the ParallelOp in the SCF dialect, in particular the scf.reduce/return constructs to see what else is required.

Mar 21 '22 18:03 aartbik

I can try to take a look at it.

Jun 07 '22 15:06 PeimingLiu

Checkout https://reviews.llvm.org/D135927

Oct 14 '22 00:10 PeimingLiu

Completed by Peiming.

Nov 10 '22 05:11 aartbik

llvm-project llvm-project copied to clipboard

Implement reductions in parallel loops (easy starter)

Extended Description

llvm-project
llvm-project copied to clipboard