circt [HWToLLVM][Arc] Add out-of-bounds handler for array accesses

Prevent hw.array_get operations from exceeding their memory bounds after lowering to LLVM. This adds a dynamic check to the index operand (if required) which can invoke a handler function to fix-up the situation or just terminate.

A stub handler quietly redirecting the access to the array's first element is added directly to the IR. The handler can be overridden at link time to enable user-defined behavior. A "default custom" handler is added to the arcilator runtime header, which emits an error message and terminates the simulation.

I'd be happy to get some feedback on the general approach first, so I'll mark this as draft. If we can agree on this approach, I'll add tests and a similar handler for hw.array_slice.

Fixes #6949

Apr 26 '24 15:04 fzi-hielscher

Sorry for the late review!

This approach allows us to override the OOB access behavior at a very late stage in the lowering pipeline. I wonder if we necessarily need this or whether it might be better to leave it more restricted.

I assume we want to define OOB accesses the same way as division by zero. This means we get an undefined value. Optimization passes can use these semantics for optimizations as long as the transformations are refinements, e.g., always return 0. The same restriction has to hold for the callback functions provided here, otherwise, the original language semantics might not be respected, making it easy for the user to accidentally break things. Furthermore, the user cannot assume that their callback function is executed for all OOB accesses originally in the code because some of them might already have been optimized away at this stage (e.g., making it useless for collecting metrics/coverage).

If a language spec leaves some room for interpretation on what should happen in such a case during simulation, I feel like it might be better to let the user influence this at some point during the frontend language lowering pipeline already.

Do you have particular use cases in mind that require this generality?

May 07 '24 07:05 maerhart

Sorry for the late review!

No worries! Thanks a lot for your feedback.

This approach allows us to override the OOB access behavior at a very late stage in the lowering pipeline. I wonder if we necessarily need this or whether it might be better to leave it more restricted.

I would not argue that we need this, but in general I like to err on the side of flexibility if it comes at a low cost. The combination of weak linking and the branch weight should hopefully keep the performance impact at a minimum, compared to an inlined handler, while still giving users the option to hook into it. At the same time it is effectively transparent for users who do not define a custom handler.

I assume we want to define OOB accesses the same way as division by zero. This means we get an undefined value. Optimization passes can use these semantics for optimizations as long as the transformations are refinements, e.g., always return 0. The same restriction has to hold for the callback functions provided here, otherwise, the original language semantics might not be respected, making it easy for the user to accidentally break things.

Usually I am the last person to argue against following language semantics as specified. And I'd insist that this should be the default. But I guess my mind has been (pun intended) poisoned by certain "unfortunate" language design decisions (e.g., implicit coercion of X to 0 in Verilog) that I see value in having the option to override them. Naturally, this should not occur by accident, but do you really see the risk of accidentally linking a "wrong" handler?

Furthermore, the user cannot assume that their callback function is executed for all OOB accesses originally in the code because some of them might already have been optimized away at this stage (e.g., making it useless for collecting metrics/coverage).

This is a good point I have not really considered so far. But I do not necessarily see it as a deal breaker. I guess the goal from my perspective would be to have the Arcilator simulation resemble the amount of non-determinism of a post ExportVerilog simulation given the same IR as closely as possible. From that point I would generally assume that any refinement of a non-deterministic to a deterministic expression would affect both simulations in the same way. It also touches on the high-level question of to what extent we even want to refine non-determinism. Of course, for verification non-determinism stinks and the more we can get rid of, the better. But I think we should also consider that any refinement to a specific value might constraint a synthesis backend in an unfavorable way. In essence, I would not consider the handler as a "guaranteed to be invoked if an OOB access occurs by the input language semantics" function. Instead, its invocation shows that it can happen and the user might want to know about this.

If a language spec leaves some room for interpretation on what should happen in such a case during simulation, I feel like it might be better to let the user influence this at some point during the frontend language lowering pipeline already.

I have mixed feelings regarding what decisions should be made at simulation runtime vs. what should be codified by the frontend into the core IR. I understand the desire of having the IR as single source of truth. But at the same time I consider simulation to be interactive to an extent that cannot possibly be covered by frontend knobs. For OOB accesses specifically, the SystemVerilog LRM states that they may cause a warning message to be issued. Right now, if I'm not mistaken, we do not have any option to specify emission of a runtime warning in the IR, at least not in a way Arcilator can handle. But let's assume we had one. What if I want to have those warnings, but suppress them during a specific time frame (e.g. while the DUT is in reset)? What if I want to ignore them for a specific array, but for others terminate the simulation and create a dump of the state? Technically we could encode all that in the IR, but I'm having a hard time seeing that as a realistic option. So, in the end the frontend would have to install its own hook for a user defined handler, wouldn't it?

Do you have particular use cases in mind that require this generality?

Nothing specific, no. But if we should go the route of division by zero and specify that an OOB access is an arbitrary combinatorial function on the array's contents and the index, IMHO it would be nice to give users the option to exercise this freedom and see if it breaks the design. E.g., return the value at a randomly selected index, or the MD5 sum of the array's contents or whatnot. More generally an (for lack of a better term) "ABI" for interactions between the C/C++ runtime environment and the model's eval function, giving an easy way of hooking into the simulation, is definitely on the list of things I'm interested in. And I feel there are few things more cumbersome than having to write custom MLIR passes building LLVM IR to implement mostly static functionality. :grin:

May 07 '24 15:05 fzi-hielscher

Pulling this PR out of limbo, I've switched things up a bit: Since the runtime library is now usable for JIT runs, there is little point in having the handler in the IR anymore. So, no more weak linking. Instead, HWToLLVM now takes an option controlling whether it should call a custom function to fix the access or just quietly fall back to the first element.

Feb 14 '25 15:02 fzi-hielscher

I'm not entirely convinced that we should allow for a user-defined out-of-bounds handler. Doing so feels like generalizing the problem too much, and giving the user control over something they will never want to control.

As a concrete example, frontend languages like Verilog define very clearly what happens on out-of-bounds accesses, so when you are using a commercial simulator like VCS or QuestaSim, they don't offer any way for the user to register a callback to provide out-of-bounds values. I think all we need to do in the case of HW-to-LLVM is to ensure that we lower to something that has undefined behaviour, but no immediate undefined behaviour like segfaulting or formatting your hard drive. But since we're free to choose any reasonable behaviour, I think we should choose something that is trivially optimized away by LLVM. Something like skipping the load through control flow (where LLVM can elide the check if it's trivial, or combine adjacent checks), or taking the index modulo the array bounds to avoid control flow. Allowing a user handler adds a lot of complexity and makes the thing very hard to optimize for LLVM.

Feb 18 '25 17:02 fabianschuiki

I can understand your skepticism and it's not a hill I want to die on. It is basically the option to have a poor man's UBSan compiled in. It would be preferable if we could let the frontends deal with it, but I'm afraid we're currently lacking the necessary core IR constructs. It is also a situation where the control-flow vs. data-flow conundrum hits hard.

I think all we need to do in the case of HW-to-LLVM is to ensure that we lower to something that has undefined behaviour, but no immediate undefined behavior like segfaulting or formatting your hard drive.

Honestly, in some situations I'd prefer a segfault over a quietly chosen arbitrary value. At least then I know that something potentially undesirable happened. And, e.g., VHDL simulations will generally halt immediately on a range constraint violation. I do recommend regular hard drive backups BTW. 😉

If we go for the optimizing route, a combination of llvm.mlir.poison and llvm.freeze would probably be the appropriate choice. That would prevent immediate UB without constraining the chosen value in any way. But can you think of an acceptable, more conservative (i.e., noticeable) approach that we could allow users to opt-in?

Feb 19 '25 00:02 fzi-hielscher

circt circt copied to clipboard

[HWToLLVM][Arc] Add out-of-bounds handler for array accesses

circt
circt copied to clipboard