Halide icon indicating copy to clipboard operation
Halide copied to clipboard

No longer silently hide errors in Metal completion handlers

Open shoaibkamil opened this issue 1 year ago • 3 comments

Previously, an error executing a command buffer in the Metal backend would simply be logged via NSLog, but never reported to users as an error. This PR changes the behavior to report the error via halide_error() which by default calls abort(), since this is unrecoverable as the pipeline that caused it has already executed.. Clients that override the halide_error() function can now detect this error and try to recover at the application level.

This is the simplest fix I could think of for #7780, but it does result in an abort() by default. Alternatively, we could set some kind of error variable in the runtime, and make any subsequent call into the Metal runtime (e.g. context acquisition) fail if the error variable is set, but in either case it would be up to the calling application to figure out how to attribute the error to the pipeline that caused it and to figure out how to recover.

shoaibkamil avatar May 23 '24 16:05 shoaibkamil

It's critical that we support both sorts of runtimes: those that expect halide_error() to abort [the default], and those that expect it merely to log (and for errors to be reported by an error result code [literally everything inside Google]. Is there no way to have an error code eventually returned, uh, somehow?

steven-johnson avatar May 23 '24 16:05 steven-johnson

Is there no way to have an error code eventually returned, uh, somehow?

There is, but all of the ways I could think of seemed more convoluted. This makes me think we should discuss again during a dev meeting to understand whether the alternatives would be acceptable.

shoaibkamil avatar May 23 '24 18:05 shoaibkamil

discuss again during a dev meeting

SGTM

steven-johnson avatar May 23 '24 18:05 steven-johnson