ocaml caml_result: an exception monad in C for resource-safe interfaces

In the OCaml runtime, some C functions that in turn call OCaml code come in two variant:

the "raising' variant, for example caml_callback, will raise exceptions directly, as OCaml code would
the 'encoded exception' variant, for example caml_callback_exn, will not raise exceptions but encode them into the return value

The 'encoded exception' variant is important for resource-safe interfaces because the caller may want to run some cleanup/release code before propagating the exception, which the 'raising' variant cannot express -- C code cannot create OCaml exception handlers. On the other hand, the encoding of exceptions into the value type is a big hack that is unsafe in two different ways:

dynamically, the value encoding exceptions are not valid OCaml values, the GC will blow up if it encounters them; callers have to follow a strict discipline of checking for an exception right away and discarding the fake value
statically, reusing the value type means that the C compiler cannot check that we use these exceptions as intended, and will happily let us mix them with real values by mistake, making the dynamic unsafety harder to guard against

In #12407, @gadmm proposed to extend the approach of 'encoded exceptions' to more parts of the runtime, to make it possible to write resource-safe code in more places, but @xavierleroy pushed against the design due to the unsafety of encoded exceptions. To be able to move forward, we need an approach for resource-safe APIs that is consensual; the present PR, which grew out of the discussios in #12407 and in particular discussions with @gadmm, is a proposed approach.

The PR proposes a new 'result' variant: caml_callback_result is similar to caml_callback_exn, but instead of returning an unsafe value it returns a new, distinct C type caml_result, that properly represents either a value or an exception. This is more safe:

dynamically, no invalid values are produced
statically, the type is incompatible with value, reminding us to deal with the exception in the caller

typedef struct {
  _Bool is_exception;
  value data;
} caml_result;

The present PR:

introoduces this caml_result type,
exposes _result variants for all functions that previously had an _exn variant, and
removes almost all uses of encoded exceptions in the runtime code itself.

For the _exn variants, the ones that were part of the public API are kept for compatibility, and those that were under CAML_INTERNALS are removed.

(The places where encoded exceptions are retained are the implementation of the caml_callback*_exn functions, which touch the bytecode interpreter and the assembly code in the runtime system. I preferred to leave them as-is, and wrap them inside caml_result on the outside.)

Mar 06 '24 23:03 gasche

@gadmm told me (private communication) that he is fine with the overall approach if we can gather consensus around this.

Mar 12 '24 12:03 gasche

This looks great, the encoded exception values returned by callback_exn are an easy trap to fall into.

(The places where encoded exceptions are retained are the implementation of the caml_callback*_exn functions, which touch the bytecode interpreter and the assembly code in the runtime system. I preferred to leave them as-is, and wrap them inside caml_result on the outside.)

This is the correct choice. The C ABI rules for returning a structure by value are tricky and vary significantly between platforms. It suits the C API (for which we have a compiler to implement those rules), but it would be unnecessarily complicated to follow these rules correctly in the per-platform assembly stubs.

Mar 13 '24 18:03 stedolan

This needs to be rebased on top of the statmemprof patches, and memprof.c adapted to the new _res convention.

Mar 18 '24 14:03 gadmm

I rebased this PR on top of trunk, and took all the standing review comments into account. Thanks a lot for all this detailed feedback.

Notable changes:

@damiendoligez suggested to simplify caml_process_pending_actions_result, this is done in a commit called "document and refactor [caml_process_pending_actions_result" (I am not giving specify hashes as they will be invalidated on my next rebase, so I recommend to look for it in the list of commits.
as @gadmm pointed out, the newly merged memprof.c codebase used _exn functions quite a bit. I converted them to _result functions in "runtime: remove encoded exceptions from memprof.c", and this can be taken as an example of _result-heavy code. (@gadmm had some justified worries about the heaviness of the current naming conventions, maybe we can check that they are acceptable there)
@gadmm wanted another name for caml_run_result, closer to caml_raise_if_exception. I moved caml_raise_if_exception (which was CAML_INTERNALS) out of the way, and renamed caml_run_result into caml_get_value_or_raise -- but my mind is at peace with the idea that I may need to change it again and again. Instead of a function exposed in mlvalues.h and defined in fail_{byt,nat}.c, this is now an inline function of fail.h.

Besides the naming of caml_get_value_or_raise, there are some questions left in the air about whether we want the conversion functions between caml_result and encoded exceptions to be made public or not.

Mar 19 '24 17:03 gasche

Of course I forgot two points.

I said that I would move from _Bool to int, but I am feeling nostalgic about a proper boolean type already -- and then maybe I need to use intnat, I am not sure. I am hoping that more knowledgeable people will conclusively say that one of these options is much better than the other.

@xavierleroy asked whether this caml_result design works in Rust. The answer is "yes, according to bindgen" (the standard Rust tool that takes C headers and generates compatible Rust definitions for them), as reported by @gadmm (quoting from his log):

Input: h.h

typedef unsigned long long value;

typedef struct caml_result {
    _Bool is_exception;
    value data;
} caml_result;

caml_result caml_process_pending_actions_res();

Output: bindgen h.h (with some boring tests elided)

/* automatically generated by rust-bindgen 0.69.4 */

pub type value = ::std::os::raw::c_ulonglong;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct caml_result {
    pub is_exception: bool,
    pub data: value,
}
#[test]
fn bindgen_test_layout_caml_result() {
    const UNINIT: ::std::mem::MaybeUninit<caml_result> = ::std::mem::MaybeUninit::uninit();
    let ptr = UNINIT.as_ptr();
    assert_eq!(
        ::std::mem::size_of::<caml_result>(),
        16usize,
        concat!("Size of: ", stringify!(caml_result))
    );
    assert_eq!(
        ::std::mem::align_of::<caml_result>(),
        8usize,
        concat!("Alignment of ", stringify!(caml_result))
    );
}
extern "C" {
    pub fn caml_process_pending_actions_res() -> caml_result;
}

This suggests that the caml_result type does have a representation compatible with Rust structures (when constrained by the #[repr(C)] pragma), so interoperability should work fine. This is not an idiomatic way to deal with results-or-failures in Rust (as a user I would expect a Result<OCamlValue,OCamlValue> type instead), but conversion functions to can be used on the Rust side if desired. We cannot hope to do better today as the standard Result type in Rust is not repr(C), so there is no way to bind it directly from C in a compatible way. (While studying this I found the tool cbindgen that supports repr(C) enums as really tagged unions, but it does not apply here.)

Mar 19 '24 17:03 gasche

I said that I would move from _Bool to int, but I am feeling nostalgic about a proper boolean type already -- and then maybe I need to use intnat

int should work, as we don't have to guarantee size and alignment for the caml_result type. I also thought of

typedef struct caml_result {
    enum { OK = 0, ERR = 1 } status;
    value data;
} caml_result;

which gets you even closer to the result type in Rust or OCaml. Plus, it leaves open the possibility of adding other kinds of results later, such as FATAL_ERROR = 2, i.e. a result that caml_get_value_or_raise would turn into a fatal error. Just an idea.

Mar 19 '24 17:03 xavierleroy

@gasche has already explored the idea of having the result type extensible in the future. However I don't see how it fits with the stability guarantees needed.

I don't think there is a decisive argument for/against any of the suggested representations, just pick any one.

Mar 20 '24 18:03 gadmm

I did a round of modification following the reviews, thanks!

For now I have used int for the is_exception field.

I also thought of
typedef struct caml_result {
    enum { OK = 0, ERR = 1 } status;
    value data;
} caml_result;
which gets you even closer to the result type in Rust or OCaml. Plus, it leaves open the possibility of adding other kinds of results later, such as FATAL_ERROR = 2, i.e. a result that caml_get_value_or_raise would turn into a fatal error. Just an idea.

I did, in a previous iteration of this PR, use an enum with two values: CAML_VALUE and CAML_EXCEPTION (I am not sure if enum names need namespacing in C, but I thought that they did). The result was fairly heavy/cumbersome, because I would then feel tempted to use switch on this field, and have to handle the default case, etc. Maybe I should revisit this with explicit equality tests instead of a switch.

The only form of extensibility that I could think of was an eventual future version of caml_result that could also represent unhandled effects. But this seemed like science-fiction, I doubted that the extension would be grateful, and I am skeptical that we would ever try to have C layers support continuation capture. I don't know about Fatal_error, but a closely related idea is Asynchronous_exception if the proposal to distinguish them from "normal exception" goes through.

What I would ideally like to do to have something more future-proof than the current definition is the following:

Expose a function to test if a result is an exception, instead of accessing the field directly. But I don't know how to name this function, in a way that would not conflict with Is_exception_result, which was an excellent name but is now taken for encoded exceptions. (I thought of reappropriating the name by removing the previous function, but that sounds a bit wild for something outside CAML_INTERNALS.)
Hide the definition of caml_result as a structure, to force users to go through the Result_value and Result_exception constructors, and I suppose some accessors as well. But I don't know how to do this in C in a way that does not hurt performance by breaking the inlining of the trivial wrappers.

Apr 01 '24 19:04 gasche

Also: @gadmm proposed changing the convention from foo_result to foo_res. His arguments make sense, so I am thinking of giving the rename a try to see how it goes in the code, but I haven't had the time to do so yet.

Apr 01 '24 20:04 gasche

Thanks for the feedback, I was going to insist about it :) In the related PR, the _result suffix often leads to ambiguities

caml_threadstatus_new_exn -> caml_threadstatus_new_result
caml_thread_domain_initialize_hook_exn -> caml_thread_domain_initialize_hook_result
caml_check_error_exn -> caml_check_error_result

Apr 01 '24 21:04 gadmm

This PR has suffered from a decision paralysis on my side: I wasn't sure what definition of caml_result was the right definition to expose, and paralyzed by the fact that it would be publicly accessible and set in stone forever.

I just pushed the following change:

the definition of mlvalues.h explicitly declares that the structure definition is private -- the type should be considered as opaque
to let users use results without accessing the structure fields, a new int caml_result_is_exception(caml_result) function is offered
this abstraction layer is now used almost everywhere in the runtime

Relevant code:

typedef struct caml_result_private caml_result;

/* This structure should be considered internal, its definition may
   change in the future. Its public interface is formed of
   - Result_value, Result_exception
   - caml_result_is_exception
   - caml_get_value_or_raise (in fail.h)
*/
struct caml_result_private {
  int is_exception;
  value data;
};

#define Result_value(v) (struct caml_result_private){ .is_exception = 0, .data = v }
#define Result_exception(exn) (struct caml_result_private){ .is_exception = 1, .data = exn }

Caml_inline int caml_result_is_exception(struct caml_result_private result)
{
  return result.is_exception;
}

I am now at peace with the idea that an int is_exception field might not be the ideal definition to set in stone forever, and willing to move forward again with the PR.

My plan for now is to try the _res suffixing rename suggested by @gadmm, possibly push this rename if I like the result better, and then move toward merging the PR based on @damiendoligez's approval. Do not hesitate to speak up if you would like to see things done differently.

May 19 '24 16:05 gasche

I pushed a rename of the foo_result functions into foo_res. I don't have a strong opinion on which is better.

This has the downside of being harder to understand, which is arguably also an upside as it clearly marks the functions as following a specific convention. This avoid clashes with some functions which already used _result for other purposes, for example sync_result or caml_lazy_read_result.

Example of code following the new convention:

  /* Call signal handlers first */
  caml_result result = caml_process_pending_signals_res();
  if (caml_result_is_exception(result)) goto exception;

  /* Call memprof callbacks */
  result = caml_memprof_run_callbacks_res();
  if (caml_result_is_exception(result)) goto exception;

  /* Call finalisers */
  result = caml_final_do_calls_res();
  if (caml_result_is_exception(result)) goto exception;

May 19 '24 20:05 gasche

Note that saying that the definition is private does not necessarily mean that you can change it later. Adding a new case can still break programs. So I do not really see the point.

It is necessary to update the documentation, how do you see it being approached?

Thanks for changing to _res, this definitely reduces the risks of ambiguity.

May 21 '24 13:05 gadmm

It is necessary to update the documentation, how do you see it being approached?

Yes, I agree that I should probably update the documentation. I may try to prepare a draft PR to submit separately, but hold off from any merge-related action here before that documentation is at least submitted.

May 21 '24 15:05 gasche

I propose documentation for caml_result in the FFI chapter of the manual in #13216.

Jun 04 '24 21:06 gasche

Note to self: the Coq implementation uses _exn functions in its fork of the bytecode interpreter:

# /usr/bin/ld: kernel/byterun/libcoqrun_stubs.a(coq_interp.o): in function `coq_interprete':
# /home/gasche/Prog/ocaml/github-trunk/_opam/.opam-switch/build/coq-core.8.19.2/_build/default/kernel/byterun/coq_interp.c:582: undefined reference to `caml_process_pending_actions_exn'

To fix this, Coq would need conditional compilations depending on the OCaml version, or we would have to restore caml_process_pending_actions_exn for backwards-compatibility.

Jul 15 '24 14:07 gasche

This is part of the public interface indeed. In fact, I checked that you preserved the functions part of the public interface so it did not raise alarm: caml_process_pending_actions_exn is still declared in signals.h indeed. It seems right to implement caml_process_pending_actions_exn in terms of caml_process_pending_actions_res for backwards-compatibility.

Jul 15 '24 15:07 gadmm

I restore caml_process_pending_actions_exn in #13304.

Jul 16 '24 09:07 gasche