memref lowering: stack overflow due to alloca's for memref descriptors for calls inside a loop

Open bondhugula opened this issue 6 years ago • 1 comments

When lowering MLIR to LLVM, since memrefs are lowered through llvm structs that hold the descriptor info, the alloca's for these structs can exhaust stack space when there are calls with memref args inside a loop! Here's an example snippet:

  affine.for %arg6 = #map11(%arg4) to #map12(%arg4) {
    call @foo(%0, %1, %arg2, %arg6) : (memref<64x512xf32>, memref<512x1xvector<8xf32>>, memref<2048x256xvector<8xf32>>, index) -> ()
  }

The call to foo will be preceded by three alloca's corresponding to the memrefs passed. The lowered LLVM dialect snippet is below, and given a typical number of %arg6 iterations, will run out of stack space (with 8 MB stacks).

^bb19(%151: !llvm.i64): // 2 preds: ^bb18, ^bb20
    %152 = llvm.icmp "slt" %151, %149 : !llvm.i64
    llvm.cond_br %152, ^bb20, ^bb21
  ^bb20:  // pred: ^bb19
    %153 = llvm.mlir.constant(1 : index) : !llvm.i64
    %154 = llvm.alloca %153 x !llvm<"{ float*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %32, %154 : !llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">
    %155 = llvm.mlir.constant(1 : index) : !llvm.i64
    %156 = llvm.alloca %155 x !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %102, %156 : !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    %157 = llvm.mlir.constant(1 : index) : !llvm.i64
    %158 = llvm.alloca %157 x !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }"> : (!llvm.i64) -> !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.store %2, %158 : !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">
    llvm.call @foo(%154, %156, %158, %151) : (!llvm<"{ float*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ <8 x float>*, i64, [2 x i64], [2 x i64] }*">, !llvm.i64) -> ()
    %159 = llvm.add %151, %150 : !llvm.i64
    llvm.br ^bb19(%159 : !llvm.i64)
  ^bb21:  // pred: ^bb19

Increasing the stack size is a stop gap and obviously solves the issue here. I think this issue requires the same approach as with block local variables in C/C++ (say large structs with loop body scope)? Another solution is of inserting these alloca's at the highest level, i.e., right after the descriptors are defined (%32, %102, and %2 above).

On a separate note, hoisting such alloca's out is valid here; however, LICM won't do it since alloc's have side effects. Moreover, it can't be done without knowing what's inside @foo, even if there is a utility to hoist alloc's. In some way, the meaning / special property of these alloc'ed descriptors is hard to later recover if you don't exploit it at the time you generate them.

Oct 25 '19 15:10 bondhugula

This should be addressed by https://github.com/llvm/llvm-project/commit/5a1778057f72b8e0444a7932144a3fa441b641bc

Feb 11 '20 12:02 ftynse