caml_result: an exception monad in C for resource-safe interfaces
In the OCaml runtime, some C functions that in turn call OCaml code come in two variant:
- the "raising' variant, for example
caml_callback, will raise exceptions directly, as OCaml code would - the 'encoded exception' variant, for example
caml_callback_exn, will not raise exceptions but encode them into the return value
The 'encoded exception' variant is important for resource-safe interfaces because the caller may want to run some cleanup/release code before propagating the exception, which the 'raising' variant cannot express -- C code cannot create OCaml exception handlers. On the other hand, the encoding of exceptions into the value type is a big hack that is unsafe in two different ways:
- dynamically, the
valueencoding exceptions are not valid OCaml values, the GC will blow up if it encounters them; callers have to follow a strict discipline of checking for an exception right away and discarding the fakevalue - statically, reusing the
valuetype means that the C compiler cannot check that we use these exceptions as intended, and will happily let us mix them with real values by mistake, making the dynamic unsafety harder to guard against
In #12407, @gadmm proposed to extend the approach of 'encoded exceptions' to more parts of the runtime, to make it possible to write resource-safe code in more places, but @xavierleroy pushed against the design due to the unsafety of encoded exceptions. To be able to move forward, we need an approach for resource-safe APIs that is consensual; the present PR, which grew out of the discussios in #12407 and in particular discussions with @gadmm, is a proposed approach.
The PR proposes a new 'result' variant: caml_callback_result is similar to caml_callback_exn, but instead of returning an unsafe value it returns a new, distinct C type caml_result, that properly represents either a value or an exception. This is more safe:
- dynamically, no invalid values are produced
- statically, the type is incompatible with
value, reminding us to deal with the exception in the caller
typedef struct {
_Bool is_exception;
value data;
} caml_result;
The present PR:
- introoduces this
caml_resulttype, - exposes
_resultvariants for all functions that previously had an_exnvariant, and - removes almost all uses of encoded exceptions in the runtime code itself.
For the _exn variants, the ones that were part of the public API are kept for compatibility, and those that were under CAML_INTERNALS are removed.
(The places where encoded exceptions are retained are the implementation of the caml_callback*_exn functions, which touch the bytecode interpreter and the assembly code in the runtime system. I preferred to leave them as-is, and wrap them inside caml_result on the outside.)
@gadmm told me (private communication) that he is fine with the overall approach if we can gather consensus around this.
This looks great, the encoded exception values returned by callback_exn are an easy trap to fall into.
(The places where encoded exceptions are retained are the implementation of the caml_callback*_exn functions, which touch the bytecode interpreter and the assembly code in the runtime system. I preferred to leave them as-is, and wrap them inside caml_result on the outside.)
This is the correct choice. The C ABI rules for returning a structure by value are tricky and vary significantly between platforms. It suits the C API (for which we have a compiler to implement those rules), but it would be unnecessarily complicated to follow these rules correctly in the per-platform assembly stubs.
This needs to be rebased on top of the statmemprof patches, and memprof.c adapted to the new _res convention.
I rebased this PR on top of trunk, and took all the standing review comments into account. Thanks a lot for all this detailed feedback.
Notable changes:
- @damiendoligez suggested to simplify
caml_process_pending_actions_result, this is done in a commit called "document and refactor [caml_process_pending_actions_result" (I am not giving specify hashes as they will be invalidated on my next rebase, so I recommend to look for it in the list of commits. - as @gadmm pointed out, the newly merged memprof.c codebase used
_exnfunctions quite a bit. I converted them to _result functions in "runtime: remove encoded exceptions from memprof.c", and this can be taken as an example of _result-heavy code. (@gadmm had some justified worries about the heaviness of the current naming conventions, maybe we can check that they are acceptable there) - @gadmm wanted another name for
caml_run_result, closer tocaml_raise_if_exception. I movedcaml_raise_if_exception(which was CAML_INTERNALS) out of the way, and renamedcaml_run_resultintocaml_get_value_or_raise-- but my mind is at peace with the idea that I may need to change it again and again. Instead of a function exposed in mlvalues.h and defined in fail_{byt,nat}.c, this is now an inline function of fail.h.
Besides the naming of caml_get_value_or_raise, there are some questions left in the air about whether we want the conversion functions between caml_result and encoded exceptions to be made public or not.
Of course I forgot two points.
I said that I would move from _Bool to int, but I am feeling nostalgic about a proper boolean type already -- and then maybe I need to use intnat, I am not sure. I am hoping that more knowledgeable people will conclusively say that one of these options is much better than the other.
@xavierleroy asked whether this caml_result design works in Rust. The answer is "yes, according to bindgen" (the standard Rust tool that takes C headers and generates compatible Rust definitions for them), as reported by @gadmm (quoting from his log):
Input: h.h
typedef unsigned long long value;
typedef struct caml_result {
_Bool is_exception;
value data;
} caml_result;
caml_result caml_process_pending_actions_res();
Output: bindgen h.h (with some boring tests elided)
/* automatically generated by rust-bindgen 0.69.4 */
pub type value = ::std::os::raw::c_ulonglong;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct caml_result {
pub is_exception: bool,
pub data: value,
}
#[test]
fn bindgen_test_layout_caml_result() {
const UNINIT: ::std::mem::MaybeUninit<caml_result> = ::std::mem::MaybeUninit::uninit();
let ptr = UNINIT.as_ptr();
assert_eq!(
::std::mem::size_of::<caml_result>(),
16usize,
concat!("Size of: ", stringify!(caml_result))
);
assert_eq!(
::std::mem::align_of::<caml_result>(),
8usize,
concat!("Alignment of ", stringify!(caml_result))
);
}
extern "C" {
pub fn caml_process_pending_actions_res() -> caml_result;
}
This suggests that the caml_result type does have a representation compatible with Rust structures (when constrained by the #[repr(C)] pragma), so interoperability should work fine. This is not an idiomatic way to deal with results-or-failures in Rust (as a user I would expect a Result<OCamlValue,OCamlValue> type instead), but conversion functions to can be used on the Rust side if desired. We cannot hope to do better today as the standard Result type in Rust is not repr(C), so there is no way to bind it directly from C in a compatible way. (While studying this I found the tool cbindgen that supports repr(C) enums as really tagged unions, but it does not apply here.)
I said that I would move from _Bool to int, but I am feeling nostalgic about a proper boolean type already -- and then maybe I need to use intnat
int should work, as we don't have to guarantee size and alignment for the caml_result type. I also thought of
typedef struct caml_result {
enum { OK = 0, ERR = 1 } status;
value data;
} caml_result;
which gets you even closer to the result type in Rust or OCaml. Plus, it leaves open the possibility of adding other kinds of results later, such as FATAL_ERROR = 2, i.e. a result that caml_get_value_or_raise would turn into a fatal error. Just an idea.
@gasche has already explored the idea of having the result type extensible in the future. However I don't see how it fits with the stability guarantees needed.
I don't think there is a decisive argument for/against any of the suggested representations, just pick any one.
I did a round of modification following the reviews, thanks!
For now I have used int for the is_exception field.
I also thought of
typedef struct caml_result { enum { OK = 0, ERR = 1 } status; value data; } caml_result;which gets you even closer to the result type in Rust or OCaml. Plus, it leaves open the possibility of adding other kinds of results later, such as
FATAL_ERROR = 2, i.e. a result thatcaml_get_value_or_raisewould turn into a fatal error. Just an idea.
I did, in a previous iteration of this PR, use an enum with two values: CAML_VALUE and CAML_EXCEPTION (I am not sure if enum names need namespacing in C, but I thought that they did). The result was fairly heavy/cumbersome, because I would then feel tempted to use switch on this field, and have to handle the default case, etc. Maybe I should revisit this with explicit equality tests instead of a switch.
The only form of extensibility that I could think of was an eventual future version of caml_result that could also represent unhandled effects. But this seemed like science-fiction, I doubted that the extension would be grateful, and I am skeptical that we would ever try to have C layers support continuation capture. I don't know about Fatal_error, but a closely related idea is Asynchronous_exception if the proposal to distinguish them from "normal exception" goes through.
What I would ideally like to do to have something more future-proof than the current definition is the following:
-
Expose a function to test if a result is an exception, instead of accessing the field directly. But I don't know how to name this function, in a way that would not conflict with
Is_exception_result, which was an excellent name but is now taken for encoded exceptions. (I thought of reappropriating the name by removing the previous function, but that sounds a bit wild for something outside CAML_INTERNALS.) -
Hide the definition of
caml_resultas a structure, to force users to go through theResult_valueandResult_exceptionconstructors, and I suppose some accessors as well. But I don't know how to do this in C in a way that does not hurt performance by breaking the inlining of the trivial wrappers.
Also: @gadmm proposed changing the convention from foo_result to foo_res. His arguments make sense, so I am thinking of giving the rename a try to see how it goes in the code, but I haven't had the time to do so yet.
Thanks for the feedback, I was going to insist about it :) In the related PR, the _result suffix often leads to ambiguities
- caml_threadstatus_new_exn -> caml_threadstatus_new_result
- caml_thread_domain_initialize_hook_exn -> caml_thread_domain_initialize_hook_result
- caml_check_error_exn -> caml_check_error_result
This PR has suffered from a decision paralysis on my side: I wasn't sure what definition of caml_result was the right definition to expose, and paralyzed by the fact that it would be publicly accessible and set in stone forever.
I just pushed the following change:
- the definition of mlvalues.h explicitly declares that the structure definition is private -- the type should be considered as opaque
- to let users use results without accessing the structure fields, a new
int caml_result_is_exception(caml_result)function is offered - this abstraction layer is now used almost everywhere in the runtime
Relevant code:
typedef struct caml_result_private caml_result;
/* This structure should be considered internal, its definition may
change in the future. Its public interface is formed of
- Result_value, Result_exception
- caml_result_is_exception
- caml_get_value_or_raise (in fail.h)
*/
struct caml_result_private {
int is_exception;
value data;
};
#define Result_value(v) (struct caml_result_private){ .is_exception = 0, .data = v }
#define Result_exception(exn) (struct caml_result_private){ .is_exception = 1, .data = exn }
Caml_inline int caml_result_is_exception(struct caml_result_private result)
{
return result.is_exception;
}
I am now at peace with the idea that an int is_exception field might not be the ideal definition to set in stone forever, and willing to move forward again with the PR.
My plan for now is to try the _res suffixing rename suggested by @gadmm, possibly push this rename if I like the result better, and then move toward merging the PR based on @damiendoligez's approval. Do not hesitate to speak up if you would like to see things done differently.
I pushed a rename of the foo_result functions into foo_res. I don't have a strong opinion on which is better.
This has the downside of being harder to understand, which is arguably also an upside as it clearly marks the functions as following a specific convention. This avoid clashes with some functions which already used _result for other purposes, for example sync_result or caml_lazy_read_result.
Example of code following the new convention:
/* Call signal handlers first */
caml_result result = caml_process_pending_signals_res();
if (caml_result_is_exception(result)) goto exception;
/* Call memprof callbacks */
result = caml_memprof_run_callbacks_res();
if (caml_result_is_exception(result)) goto exception;
/* Call finalisers */
result = caml_final_do_calls_res();
if (caml_result_is_exception(result)) goto exception;
Note that saying that the definition is private does not necessarily mean that you can change it later. Adding a new case can still break programs. So I do not really see the point.
It is necessary to update the documentation, how do you see it being approached?
Thanks for changing to _res, this definitely reduces the risks of ambiguity.
It is necessary to update the documentation, how do you see it being approached?
Yes, I agree that I should probably update the documentation. I may try to prepare a draft PR to submit separately, but hold off from any merge-related action here before that documentation is at least submitted.
I propose documentation for caml_result in the FFI chapter of the manual in #13216.
Note to self: the Coq implementation uses _exn functions in its fork of the bytecode interpreter:
# /usr/bin/ld: kernel/byterun/libcoqrun_stubs.a(coq_interp.o): in function `coq_interprete':
# /home/gasche/Prog/ocaml/github-trunk/_opam/.opam-switch/build/coq-core.8.19.2/_build/default/kernel/byterun/coq_interp.c:582: undefined reference to `caml_process_pending_actions_exn'
To fix this, Coq would need conditional compilations depending on the OCaml version, or we would have to restore caml_process_pending_actions_exn for backwards-compatibility.
This is part of the public interface indeed. In fact, I checked that you preserved the functions part of the public interface so it did not raise alarm: caml_process_pending_actions_exn is still declared in signals.h indeed. It seems right to implement caml_process_pending_actions_exn in terms of caml_process_pending_actions_res for backwards-compatibility.
I restore caml_process_pending_actions_exn in #13304.