ocaml-interop Investigate how to increase compatibility with ocaml-rs to ease integration

WIP

Motivation

ocaml-rs already solves a bunch of things that are planned in ocaml-interop, and ocaml-interop already provides a safe API that could benefit ocaml-rs. The idea her is to figure out what would be necessary for "sandwiching" ocaml-interop in-between ocaml-sys and ocaml-rs's higher level APIs.

What `ocaml-interop` solves for `ocaml-rs`

Lifetimes and proper abstractions to enforce correct usage of OCaml values across OCaml allocations
More accurate types for OCaml values. ocaml-rs represents OCaml values as Value, ocaml-interop uses OCaml<T>.
More fine-grained definitions for OCaml values, this removes a whole class of errors (so there is no need to validate operations performed on an OCaml value, because you know it is of the correct type), and also allows for the definition of new operations for new types.
Multiple ways to convert a value (ocaml-rs has a 1-on-1 mapping between Rust and OCaml values, in ocaml-interop there is more flexibility and control on what can convert into what).

What `ocaml-rs` solves for `ocaml-interop`

Derivers for structs/enums (ocaml-interop uses macros to define how to map between Rust structs/enums and OCaml records/variants)
Support for custom blocks.
Support for bigarrays.
Rust panic handling
OCaml exceptions from the Rust side, handling Err results to produce OCaml exceptions
Annotations for exported functions (ocaml-interop uses a wrapper macro for this)
Support for bytecode OCaml

Ideas for the future

Starting from the current ocaml-rs derivers, and combined with ocaml-interop's representation of OCaml values, implementations for interacting with OCaml records/variants could be generated for derive annotated structs so that allocations that result from OCaml->Rust conversions can be avoided, or controlled with more granularity.

Modifications Summary

Additions

WIP

ocaml-sys: a macro like ocaml_frame from ocaml-interop but for plain Values to replace caml_param?

Changes

WIP

ocaml-rs: Value would become a re-export of ocaml-interops OCaml<'a, T>.
ocaml-rs: Exported functions would have an implicit ocaml_frame instead of an caml_body.

Deprecations / Removals

WIP

ocaml-sys: caml_param and caml_body macros. ocaml_frame macro from ocaml-interop should be used instead (maybe a lower-level version that works using RawOCaml values could be implemented and added to ocaml-sys).
ocaml-rs: List, Array, etc types get replaced by OCaml<OCamlList<T>>, etc types from ocaml-interop.
ocaml-interop: most (all?) macros related to mapping between Rust<->OCaml Structs/Enums/RecordsVariants. Not needed when the ocaml-rs derivers exist (doesn't cover all the cases that the macros cover, but those are probably better handled by intermediary conversions anyway).
ocaml-interop: ocaml_export macro, not needed when the function annotations from ocaml-rs exist.

Other

Would be great if the OCaml version changes fragility situation could be improved somehow (ocaml-rs does a good job already, but it core OCaml could help us here, that would be great, because the OCaml runtime seems to be going through a bunch of changes lately).
At the moment, no synchronization is made by ocaml-interop (nor ocaml-rs?) for accessing the OCaml runtime, and is left for the user to handle. Having a solution baked into the libraries would be good.

Oct 12 '20 13:10 tizoc

@zshipko this is not complete (in particular, I have to go through ocaml-derive to get a better idea of what should change), but if you have any comment, suggestion, question, or whatever just leave a comment here.

Oct 20 '20 11:10 tizoc

Thanks, this looks good! I will go over it a few times and add a comment if anything else comes to mind, but it looks pretty complete already.

Working on the ocaml-derive port after the interface is somewhat stable is probably a good idea to avoid having to rewrite the macros as things change. Working with the Rust AST in proc-macros isn't very convenient.

Oct 21 '20 03:10 zshipko

@tizoc After spending a little time thinking about how to re-implement ocaml-derive using ocaml-interop, I realized the biggest difference is that we will need to consider is how to pre-declare rooted values. I imagine it could look something like:

#[ocaml::func]
#[ocaml::roots(a)]
pub fn my_function(x: i32) -> i32 {
    a_variable = x.to_ocaml(gc);
    a_variable
}

Another consideration is how we should allow gc to be renamed - in the example above it's implied, however we could add an additional optional macro parameter to specify the name:

#[ocaml::func(gc)]

I can start an initial implementation in the next few weeks - I'm sure I'll run into some more cases like this where the ergonomics are a little different between the two libraries.

Nov 06 '20 21:11 zshipko

Hi @zshipko! I haven't been active on this during the last week because I got sidetracked and been working on something else, but I plan to get back to this next week.

One thing that is very likely to change, is that the gc handle will not be passed around anymore to allocation functions, instead a runtime handle will be used.

It will work similarly to how the gc handle works, the difference is that it will have to be passed around (as a mutable borrow probably, just like the gc handle) from the "root" of the program where the OCaml runtime is initialized.

Any function that wants to perform an allocation or interact with the OCaml runtime will have to accept this as a parameter. The idea is for this handle to be the source of lifetimes for unrooted OCaml values (like the gc handle is now), and also the representation of the OCaml runtime lock. So Rust code that doesn't allocate will be able to "lose" this handle to signal that no OCaml runtime access is performed and that such code can run in parallel to the OCaml runtime (through caml_{acquire,release}_runtime_system / caml_{leave,enter}_blocking_section).

What needs to figured out is how to make this accessible to callback functions that are called from OCaml. My idea for now is to just have an unsafe (undocumented) function to get ahold of this runtime handle+lock, and then ocaml-derive will generate the code to obtain this handle.

Whatever you come up in terms of API regarding the gc handle will probably work for this new runtime handle, although some things may be slightly different, so I wanted to be aware of this just in case.

Nov 06 '20 22:11 tizoc

Thanks for the heads up, in that case I will take the opportunity to experiment with the interface a bit and see what works while those updates are in progress.

Nov 07 '20 02:11 zshipko

@zshipko I'm thinking that #[ocaml::roots(a)] may not be necessary, because if you need them you can just open a new frame. What is important is that the OCaml runtime handle is made available to these functions (although it is not a parameter).

I'm also thinking that there may be situations where declaring the name of the variable that will hold the OCaml runtime handle could be skipped too, making the Rust function automatically release the OCaml runtime lock to be able to run in parallel to the OCaml runtime. Anyway, this aspect will be clearer once I'm done with the aqcuire/release API.

Nov 23 '20 15:11 tizoc

@tizoc I have been looking at this a little closer and am wondering what more needs to be done by the derive macros other than wrapping the functions in ocaml_export? Or is your goal to eventually remove ocaml_export?

For instance, this:

ocaml_export! {
    fn twice_boxed_int(gc, num: OCaml<OCamlInt64>) -> OCaml<OCamlInt64> {
        let num = num.to_rust();
        let result = num * 2;
        to_ocaml!(gc, result)
    }
}

would just become:

#[derive(ocaml::func(gc))]
pub fn twice_boxed_int(num: OCaml<OCamlInt64>) -> OCaml<OCamlInt64> {
    let num = num.to_rust();
    let result = num * 2;
    to_ocaml!(gc, result)
}

Due to the introduction of OCaml<T> (which is a major improvement) it's unclear if that can be simplified further - I still have one idea I'd like to try, but do you have any thoughts?

Dec 17 '20 04:12 zshipko

@zshipko that is a valid implementation, yes. But eventually, I would like to get rid of that macro because the code is horrible. Note also that ocaml_export! doesn't do some nice things ocaml-derive does, like making the functions callable from bytecode OCaml, and handling panics to re-raise them to the OCaml side.

Maybe it is easier initially to just update the macros to handle that stuff, just to get the exported functions out of the way for now. If that is the case, then we can do that as a temporary solution.

Btw, somewhat related to this: we have been discussing with the INRIA guys that in Rust, using a caller-save convention to function arguments (so that the exported function receives OCamlRooted<T> arguments instead of OCaml<T>) may be a better option. Initially we would have to wrap functions so that they perform all the rooting on the Rust side, but eventually an annotation could be added to OCaml's external declarations so tell OCaml to pass pointers to rooted values instead (that could just be pointers to the OCaml stack).

This is something I am playing with now in ocaml-interop.

Dec 17 '20 11:12 tizoc

Cool, that sounds interesting!

I have an implementation I'm happy with in https://github.com/zshipko/ocaml-rs/tree/interop2 - Just got tests working, but there still could be some other things I need to clean up.

The caml_param and caml_body macros have been removed from ocaml-sys and instead ocaml-interop is used to re-implement ocaml::frame and ocaml::body. I've also marked all methods on Value as unsafe. This approach allows for users to pick between arguments/return values that can be automatically converted to the Rust representation as well asOCaml<T> or Value depending on their use case. For the most part, I've just implemented ToValue and FromValue for OCaml<T>, which means there's very little modification needed to derive to support OCaml<T>.

There now an implicit gc variable which holds the OCamlRuntime value (this can be renamed like #[ocaml::func(my_gc_name)]) - I'm still kind of thinking about this part a little bit.

Dec 18 '20 20:12 zshipko

Great. I expect to finish the new caller-saved args version of ocaml_export! soon, and then I can take a look at that branch.

Dec 18 '20 22:12 tizoc

Just as a heads-up, I've been working on my own on adding missing features within ocaml-interop.

The most polished branches are for bigarrays and passing Rust types to OCaml through custom blocks (will send PRs).

I've also looked into catching panics, I've noticed that the current approach in ocaml-rs is to replace the hook that's used to print backtraces and call the OCaml runtime to throw, which I think should be changed to a catch_unwind-based approach. Directly calling caml_raise_with_arg which will unwind across the FFI boundary gets into UB territory.

Dec 20 '20 09:12 g2p

@g2p that is great, thank you. Please note that right now I'm making some changes (#16 and #17) to the calling convention (switching to a caller-rooting approach, and getting rid of the calling macros), this will probably affect anything you have been working on.

Dec 20 '20 12:12 tizoc

I have worked as much as I can on this pre-rooting changes, and will have a look at porting to these changes after that. I'm pretty sure the change to have params rooted without opening an explicit frame (if I understand right) will fix the last bug I had to work around in my application.

Dec 22 '20 18:12 g2p

I'm sending PRs for trivial stuff, and I'll hold on the rest (though, the branches are published) until you tell me it's fine to base work on your changes.

Dec 22 '20 18:12 g2p

@g2p sounds good. I'm revising some stuff, and changing a few names. I will let you know once it is ready.

Dec 22 '20 18:12 tizoc

Merged some changes I have been working on since last month earlier today and released a new version.

The main thing that changed is that function calls now center around OCamlRef, which are references to OCamlCell values, which are themselves locations in memory containing an OCaml value (either a pointer or an immediate).

A new caller-rooting calling convention is used, in contrast to the old (and OCaml's C API) callee-rooting convention (that is, on ocaml-interop, now it is the responsibility of the caller to root the values, while before it was up to the callee). This simplified some things, and there is no longer any need for using macros to call OCaml functions.

Return values are still OCaml<T>, and have to be rooted by the caller after the value is returned. We are investigating an alternative approach using global roots (but a new implementation that should perform well, not what OCaml provides by default) that has the potential to simplify things more (this would for example, avoid the need to open new frames with local roots). Because of this I added a note on the README explaining that the current API should be considered unstable, because it is likely to keep changing.

Jan 20 '21 16:01 tizoc

Nice! I have that PR open on ocaml-rs that uses ocaml-interop under the hood (https://github.com/zshipko/ocaml-rs/pull/48).

I'm still doing some testing and verification but it's looking good and I hope to be able to make a new release sometime in the next few weeks. I will make sure to tag you in that before I move forward though.

Jan 20 '21 17:01 zshipko

ocaml-interop ocaml-interop copied to clipboard

Investigate how to increase compatibility with ocaml-rs to ease integration

Motivation

What ocaml-interop solves for ocaml-rs

What ocaml-rs solves for ocaml-interop

Ideas for the future

Modifications Summary

Additions

Changes

Deprecations / Removals

Other

ocaml-interop
ocaml-interop copied to clipboard

What `ocaml-interop` solves for `ocaml-rs`

What `ocaml-rs` solves for `ocaml-interop`