ocaml-interop
ocaml-interop copied to clipboard
Investigate how to increase compatibility with ocaml-rs to ease integration
WIP
Motivation
ocaml-rs already solves a bunch of things that are planned in ocaml-interop
, and ocaml-interop
already provides a safe API that could benefit ocaml-rs
. The idea her is to figure out what would be necessary for "sandwiching" ocaml-interop
in-between ocaml-sys
and ocaml-rs
's higher level APIs.
What ocaml-interop
solves for ocaml-rs
- Lifetimes and proper abstractions to enforce correct usage of OCaml values across OCaml allocations
- More accurate types for OCaml values.
ocaml-rs
represents OCaml values asValue
,ocaml-interop
usesOCaml<T>
. - More fine-grained definitions for OCaml values, this removes a whole class of errors (so there is no need to validate operations performed on an OCaml value, because you know it is of the correct type), and also allows for the definition of new operations for new types.
- Multiple ways to convert a value (
ocaml-rs
has a 1-on-1 mapping between Rust and OCaml values, inocaml-interop
there is more flexibility and control on what can convert into what).
What ocaml-rs
solves for ocaml-interop
- Derivers for structs/enums (
ocaml-interop
uses macros to define how to map between Rust structs/enums and OCaml records/variants) - Support for custom blocks.
- Support for bigarrays.
- Rust panic handling
- OCaml exceptions from the Rust side, handling
Err
results to produce OCaml exceptions - Annotations for exported functions (
ocaml-interop
uses a wrapper macro for this) - Support for bytecode OCaml
Ideas for the future
- Starting from the current
ocaml-rs
derivers, and combined withocaml-interop
's representation of OCaml values, implementations for interacting with OCaml records/variants could be generated forderive
annotated structs so that allocations that result from OCaml->Rust conversions can be avoided, or controlled with more granularity.
Modifications Summary
Additions
WIP
-
ocaml-sys
: a macro likeocaml_frame
fromocaml-interop
but for plainValue
s to replacecaml_param
?
Changes
WIP
-
ocaml-rs
:Value
would become a re-export ofocaml-interop
sOCaml<'a, T>
. -
ocaml-rs
: Exported functions would have an implicitocaml_frame
instead of ancaml_body
.
Deprecations / Removals
WIP
-
ocaml-sys
:caml_param
andcaml_body
macros.ocaml_frame
macro fromocaml-interop
should be used instead (maybe a lower-level version that works usingRawOCaml
values could be implemented and added toocaml-sys
). -
ocaml-rs
:List
,Array
, etc types get replaced byOCaml<OCamlList<T>>
, etc types fromocaml-interop
. -
ocaml-interop
: most (all?) macros related to mapping between Rust<->OCaml Structs/Enums/RecordsVariants. Not needed when theocaml-rs
derivers exist (doesn't cover all the cases that the macros cover, but those are probably better handled by intermediary conversions anyway). -
ocaml-interop
:ocaml_export
macro, not needed when the function annotations fromocaml-rs
exist.
Other
- Would be great if the OCaml version changes fragility situation could be improved somehow (
ocaml-rs
does a good job already, but it core OCaml could help us here, that would be great, because the OCaml runtime seems to be going through a bunch of changes lately). - At the moment, no synchronization is made by
ocaml-interop
(norocaml-rs
?) for accessing the OCaml runtime, and is left for the user to handle. Having a solution baked into the libraries would be good.
@zshipko this is not complete (in particular, I have to go through ocaml-derive to get a better idea of what should change), but if you have any comment, suggestion, question, or whatever just leave a comment here.
Thanks, this looks good! I will go over it a few times and add a comment if anything else comes to mind, but it looks pretty complete already.
Working on the ocaml-derive
port after the interface is somewhat stable is probably a good idea to avoid having to rewrite the macros as things change. Working with the Rust AST in proc-macros isn't very convenient.
@tizoc After spending a little time thinking about how to re-implement ocaml-derive
using ocaml-interop
, I realized the biggest difference is that we will need to consider is how to pre-declare rooted values. I imagine it could look something like:
#[ocaml::func]
#[ocaml::roots(a)]
pub fn my_function(x: i32) -> i32 {
a_variable = x.to_ocaml(gc);
a_variable
}
Another consideration is how we should allow gc
to be renamed - in the example above it's implied, however we could add an additional optional macro parameter to specify the name:
#[ocaml::func(gc)]
I can start an initial implementation in the next few weeks - I'm sure I'll run into some more cases like this where the ergonomics are a little different between the two libraries.
Hi @zshipko! I haven't been active on this during the last week because I got sidetracked and been working on something else, but I plan to get back to this next week.
One thing that is very likely to change, is that the gc
handle will not be passed around anymore to allocation functions, instead a runtime handle will be used.
It will work similarly to how the gc
handle works, the difference is that it will have to be passed around (as a mutable borrow probably, just like the gc
handle) from the "root" of the program where the OCaml runtime is initialized.
Any function that wants to perform an allocation or interact with the OCaml runtime will have to accept this as a parameter. The idea is for this handle to be the source of lifetimes for unrooted OCaml values (like the gc
handle is now), and also the representation of the OCaml runtime lock. So Rust code that doesn't allocate will be able to "lose" this handle to signal that no OCaml runtime access is performed and that such code can run in parallel to the OCaml runtime (through caml_{acquire,release}_runtime_system
/ caml_{leave,enter}_blocking_section
).
What needs to figured out is how to make this accessible to callback functions that are called from OCaml. My idea for now is to just have an unsafe (undocumented) function to get ahold of this runtime handle+lock, and then ocaml-derive
will generate the code to obtain this handle.
Whatever you come up in terms of API regarding the gc
handle will probably work for this new runtime handle, although some things may be slightly different, so I wanted to be aware of this just in case.
Thanks for the heads up, in that case I will take the opportunity to experiment with the interface a bit and see what works while those updates are in progress.
@zshipko I'm thinking that #[ocaml::roots(a)]
may not be necessary, because if you need them you can just open a new frame. What is important is that the OCaml runtime handle is made available to these functions (although it is not a parameter).
I'm also thinking that there may be situations where declaring the name of the variable that will hold the OCaml runtime handle could be skipped too, making the Rust function automatically release the OCaml runtime lock to be able to run in parallel to the OCaml runtime. Anyway, this aspect will be clearer once I'm done with the aqcuire/release API.
@tizoc I have been looking at this a little closer and am wondering what more needs to be done by the derive macros other than wrapping the functions in ocaml_export
? Or is your goal to eventually remove ocaml_export
?
For instance, this:
ocaml_export! {
fn twice_boxed_int(gc, num: OCaml<OCamlInt64>) -> OCaml<OCamlInt64> {
let num = num.to_rust();
let result = num * 2;
to_ocaml!(gc, result)
}
}
would just become:
#[derive(ocaml::func(gc))]
pub fn twice_boxed_int(num: OCaml<OCamlInt64>) -> OCaml<OCamlInt64> {
let num = num.to_rust();
let result = num * 2;
to_ocaml!(gc, result)
}
Due to the introduction of OCaml<T>
(which is a major improvement) it's unclear if that can be simplified further - I still have one idea I'd like to try, but do you have any thoughts?
@zshipko that is a valid implementation, yes. But eventually, I would like to get rid of that macro because the code is horrible. Note also that ocaml_export!
doesn't do some nice things ocaml-derive does, like making the functions callable from bytecode OCaml, and handling panics to re-raise them to the OCaml side.
Maybe it is easier initially to just update the macros to handle that stuff, just to get the exported functions out of the way for now. If that is the case, then we can do that as a temporary solution.
Btw, somewhat related to this: we have been discussing with the INRIA guys that in Rust, using a caller-save convention to function arguments (so that the exported function receives OCamlRooted<T>
arguments instead of OCaml<T>
) may be a better option. Initially we would have to wrap functions so that they perform all the rooting on the Rust side, but eventually an annotation could be added to OCaml's external
declarations so tell OCaml to pass pointers to rooted values instead (that could just be pointers to the OCaml stack).
This is something I am playing with now in ocaml-interop.
Cool, that sounds interesting!
I have an implementation I'm happy with in https://github.com/zshipko/ocaml-rs/tree/interop2 - Just got tests working, but there still could be some other things I need to clean up.
The caml_param
and caml_body
macros have been removed from ocaml-sys
and instead ocaml-interop
is used to re-implement ocaml::frame
and ocaml::body
. I've also marked all methods on Value
as unsafe. This approach allows for users to pick between arguments/return values that can be automatically converted to the Rust representation as well asOCaml<T>
or Value
depending on their use case. For the most part, I've just implemented ToValue
and FromValue
for OCaml<T>
, which means there's very little modification needed to derive
to support OCaml<T>
.
There now an implicit gc
variable which holds the OCamlRuntime
value (this can be renamed like #[ocaml::func(my_gc_name)]
) - I'm still kind of thinking about this part a little bit.
Great. I expect to finish the new caller-saved args version of ocaml_export!
soon, and then I can take a look at that branch.
Just as a heads-up, I've been working on my own on adding missing features within ocaml-interop.
The most polished branches are for bigarrays and passing Rust types to OCaml through custom blocks (will send PRs).
I've also looked into catching panics, I've noticed that the current approach in ocaml-rs is to replace the hook that's used to print backtraces and call the OCaml runtime to throw, which I think should be changed to a catch_unwind-based approach. Directly calling caml_raise_with_arg which will unwind across the FFI boundary gets into UB territory.
@g2p that is great, thank you. Please note that right now I'm making some changes (#16 and #17) to the calling convention (switching to a caller-rooting approach, and getting rid of the calling macros), this will probably affect anything you have been working on.
I have worked as much as I can on this pre-rooting changes, and will have a look at porting to these changes after that. I'm pretty sure the change to have params rooted without opening an explicit frame (if I understand right) will fix the last bug I had to work around in my application.
I'm sending PRs for trivial stuff, and I'll hold on the rest (though, the branches are published) until you tell me it's fine to base work on your changes.
@g2p sounds good. I'm revising some stuff, and changing a few names. I will let you know once it is ready.
Merged some changes I have been working on since last month earlier today and released a new version.
The main thing that changed is that function calls now center around OCamlRef
, which are references to OCamlCell
values, which are themselves locations in memory containing an OCaml value (either a pointer or an immediate).
A new caller-rooting calling convention is used, in contrast to the old (and OCaml's C API) callee-rooting convention (that is, on ocaml-interop, now it is the responsibility of the caller to root the values, while before it was up to the callee). This simplified some things, and there is no longer any need for using macros to call OCaml functions.
Return values are still OCaml<T>
, and have to be rooted by the caller after the value is returned. We are investigating an alternative approach using global roots (but a new implementation that should perform well, not what OCaml provides by default) that has the potential to simplify things more (this would for example, avoid the need to open new frames with local roots). Because of this I added a note on the README explaining that the current API should be considered unstable, because it is likely to keep changing.
Nice! I have that PR open on ocaml-rs that uses ocaml-interop under the hood (https://github.com/zshipko/ocaml-rs/pull/48).
I'm still doing some testing and verification but it's looking good and I hope to be able to make a new release sometime in the next few weeks. I will make sure to tag you in that before I move forward though.