numba icon indicating copy to clipboard operation
numba copied to clipboard

LLVM IR generation bug when adding metadata for `excinfo`-related store instruction

Open dlee992 opened this issue 1 year ago • 8 comments

I noticed in a certain case, numba will emit LLVM IR like this:

B10.if:                                           ; preds = %B10.if.loopexit, %B10.if.loopexit10
  %excinfo.2.0 = phi { i8*, i32, i8*, i8*, i32 }* [ @.const.picklebuf.23377073394240, %B10.if.loopexit10 ], [ @.const.picklebuf.23377075691904, %B10.if.loopexit ]
  %2 = bitcast i32* %.9.i to i8*
  %3 = bitcast i32* %.7.i to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %3)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %2)
  store { i8*, i32, i8*, i8*, i32 }* %excinfo.2.0, { i8*, i32, i8*, i8*, i32 }** %excinfo, align 8
  br label %common.ret

I think this branch is used for storing excinfo and return. But the store instr doesn't have the metadata of numba_exception_output, which makes the ref_prune pass in llvmlite can't remove some ref pairs. The source code in llvmlite:

    bool isRaising(const BasicBlock *bb) {
        for (auto &instruction : *bb) {
            if (instruction.getOpcode() == Instruction::Store) {
                auto name = dyn_cast<StoreInst>(&instruction)
                                ->getPointerOperand()
                                ->getName();
                if (name.equals("excinfo") &&
                    instruction.hasMetadata("numba_exception_output")) {
                    return true;
                }
            }
        }
        return false;
    }

Willing to provide a python test case later. Trying to find a fix meanwhile.

dlee992 avatar Apr 29 '24 19:04 dlee992

Yeah, in numba/core/callconv.py, more than one location can generate excinfo-related store instr, but only one location adds the metadata for it.

Can push a PR soon.

dlee992 avatar Apr 29 '24 19:04 dlee992

could be related to https://github.com/numba/numba/issues/9186, where I observed this excinfo change hurts the performance.

dlee992 avatar Apr 29 '24 19:04 dlee992

@dlee992, I wasn't aware of the numba_exception_output metadata. On callconv.py::set_dynamic_user_exc, one should set the metadata for the last store instruction as well.

https://github.com/numba/numba/blob/4318ee4aeaaad99fa5965d8b723fc0c91581cab9/numba/core/callconv.py#L707-L715

Can you try that?

guilhermeleobas avatar Apr 30 '24 14:04 guilhermeleobas

The LLVM IR is from post-optimization and LLVM is allowed to move/remove metadata as it wish. Perhaps, we should stop using metadata but instead detect the store going to %exeinfo which must be one of the argument on the LLVM function.

So, in isRaising, it should detect that the store target is the LLVM argument for the exception pointer.

sklam avatar Apr 30 '24 14:04 sklam

@sklam, should we go with adding the metadata for the missing case? Or just try to fix the isRaising in llvmlite?

guilhermeleobas avatar Apr 30 '24 15:04 guilhermeleobas

@guilhermeleobas, yes, I tried to add more metadata in #9551 . And my internal test case's performance is better with that. ex: w/o this PR, calls of NRT_decref are 3000 times; w/ this PR, the calls go down to 1800 times.

@sklam, yes, changing the detection way in llvmlite side is also a solution.

dlee992 avatar Apr 30 '24 15:04 dlee992

This issue is marked as stale as it has had no activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with any updates and confirm that this issue still needs to be addressed.

github-actions[bot] avatar May 31 '24 01:05 github-actions[bot]

The discussion seems like this is an actual issue, though I'm not clear about the exact nature of it, so I've labelled as "incorrect behaviour" and removed "needtriage" so it won't go stale again.

gmarkall avatar May 31 '24 10:05 gmarkall