interoptopus Support for dataful enums?

Dataful enums are a powerful feature of rust's type system, and unfortunately this means they are difficult to represent elegantly in many other languages. However, at minimum dataful enums in rust can be represented as a tagged union in C, and in languages with structural inheritance, this could be translated to a class or record type hierarchy. Adding support for dataful enums, even rudimentary support only mapping them to tagged unions in the target languages, would be a very valuable feature for rust interoperability.

As an example of a simple dataful enum on rust side:

#[repr(u32,C)]
pub enum Value {
    VByte(u8),
    VFloat(f32),
}

And its equivalent on the C# side, including the facility to perform basic matching on the variant:

public struct Value
{
    private enum Tag
    {
        VByte,
        VFloat,
    }

    [FieldOffset(0)] private Tag tag;
    [FieldOffset(4)] private Byte byte_data;
    [FieldOffset(4)] private Single float_data;

    public void Match(Action<byte> byte_f, Action<Single> float_f)
    {
        Match<object>(b =>
        {
            byte_f(b); return null;
        },
        f =>
        {
            float_f(f); return null;
        });
    }
    public R Match<R>(Func<byte, R> byte_f, Func<Single, R> float_f)
    {
        switch (tag)
        {
            case Tag.VByte:
                return byte_f(byte_data);
            case Tag.VFloat:
                return float_f(float_data);
            default:
                throw new ArgumentOutOfRangeException();
        }
    }
}

Optionally, this tagged union could be converted to a class hierarchy, if this is deemed valuable:

public abstract class ValueClass
{
    public static ValueClass From(Value v)
    {
        return v.Match<ValueClass>(
            b => new ByteValue(b),
            f => new FloatValue(f)
        );
    }
    public class ByteValue : ValueClass
    {
        public Byte b;

        public ByteValue(byte b)
        {
            this.b = b;
        }
    }
    public class FloatValue : ValueClass
    {
        public Single f;

        public FloatValue(float f)
        {
            this.f = f;
        }
    }
}

Apr 28 '22 04:04 Zoybean

Generally, having such a feature would be nice and I think some experimentation might be warranted. That experimentation should probably address:

How difficult / brittle will proc-macro code be that has to do an #[ffi_type] on the enum? Like, something must parse the added complexity and translate it into a interoptopus::lang::rust type.
How does the fallback C look like? Should be pretty obvious, but still good to see for some more complex cases
How would an high-level (or at least one) backends translate that? You already gave C# examples, but I have to say I don't like their ergonomics, at least not those of struct Value; imagine match with ~10 variants. (On a side note, I don't even like our existing OptionXXX and SliceXXX code gen, but that seemingly was as good as C# allowed it).
Who is responsible for determining [FieldOffset(X)]? So far I tried to avoid computing those myself to avoid compiler- or platform-specific UB accidents.
Do they / should they compose with FFIOption<T> or FFISlice<T>?
Some practical assessment, once you have a good idea how backends can handle these, would one actually want to use them from Rust, or are void pointers and a bit of unsafe the better option?

I probably won't do this myself as I don't really have a need for tagged enums in my current APIs, but any decently clean PR that has answers to the points above has a good chance getting merged. I can also help answer questions.

Apr 28 '22 10:04 ralfbiedert

I really appreciate the thorough response! I don't have answers yet, just thoughts for now.

Regarding proc-macros, I have next to no experience, so I can't comment directly.
Regarding C fallback, I haven't written C in a few years, and I've never been confident, but I can look into it. Possibly a solved problem for rust in any case, there may be crates with prior art.
Regarding ergonomics of access; the Match methods give the equivalent of an exhaustive match of all variants on rust-side, so I figure any solution would either have to include it or something at least as expressive.
- I've included some sample code and further discussion below, expanding on what I've written here.
Regarding field offset; at first I thought this would be trivial for unary variants with repr(int), but I had forgotten alignment. Either way, I think it is fair to say that unary variants are the simplest case.
- Perhaps for a first version, the user will be in charge of (unsafely) determining the size of the discriminant-plus-padding, allowing unary variants to be considered.
- Possibly, n-ary variants could be treated the same as unary variants with a struct element? I don't know if these are guaranteed to work that way with repr(C), that will need some investigation.
- Otherwise, perhaps unary variants could be the first version, and n-ary variants could be allowed in a subsequent version?
Regarding composition with other types, I don't know the details of how niche optimisations work with repr(C) outside of the nullable-pointer optimisation. That will require some reading.
Regarding wanting to use them; anecdotally, yes: I want to use dataful enums in rust-c# ffi for a work project, and the effort of writing the bindings for 2 important dataful enums (one with 5 variants and one with 400-or-so unit variants) was too much to consider worthwhile, which led to my writing this issue. I use C# a lot at work, and enums a lot in rust, so I suspect that more opportunities like this will come up.

Further discussion of ergonomics:

So, I recognise that this is both necessary for usability and potentially very subjective, so I'm keen to discuss the ergonomics.

I think the ideal situation would be:

generating a struct to use on the FFI boundary, and a class hierarchy to hand back to the user.
the struct implements a single method to convert it to an instance of that class hierarchy
a thin wrapper on the FFI methods performs the necessary conversion of any types that include a dataful enum, from struct to class.
the generated classes are all partial classes, allowing the user to separately implement their own logic on the type.

Unfortunately, though I don't know how many languages other than C# allow for partial classes (i.e. having multiple additive definitions / implementations of a class). Given that caveat, I think providing Match on the class (possibly as an opt-in) would be a bare-minimum requirement, in case the target language does not allow partial classes. I definitely don't think Match needs to be the entirety of the interface, though there could be a tradeoff between expressivity and code-size, if that matters.

Here's an example of a Match on the class version.

public abstract partial class ValueClass
{
    public abstract R Match<R>(Func<byte, R> byte_f, Func<Single, R> float_f);

    public partial class ByteValue : ValueClass
    {
        public Byte b;

        public ByteValue(byte b)
        {
            this.b = b;
        }

        public override R Match<R>(Func<byte, R> byte_f, Func<float, R> float_f)
        {
            return byte_f(b);
        }
    }
    public partial class FloatValue : ValueClass
    {
        public Single f;

        public FloatValue(float f)
        {
            this.f = f;
        }

        public override R Match<R>(Func<byte, R> byte_f, Func<float, R> float_f)
        {
            return float_f(f);
        }
    }
}

As an example, once the match operation is possible (on a class or struct or otherwise), it should be trivial to implement other methods (like IsVariantX for matches!, IfVariantX for if let, UnwrapVariantX) "for free"; though rust constructs like if let... else return / let else may be impossible to implement using this scheme, as it relies on having all the logic in closures and method calls, which cannot return from the calling environment. Here's an example implementation of other control-flow methods, in terms of Match:

public void Match(Action<byte> byte_f, Action<Single> float_f)
{
    Match<object>(
        b =>
        {
            byte_f(b);
            return null;
        },
        f =>
        {
            float_f(f);
            return null;
        }
    );
}
public R IfByte<R>(Func<byte, R> byte_f, R def)
{
    return Match(byte_f, f => def);
}
public R IfFloat<R>(Func<float, R> float_f, R def)
{
    return Match(b => def, float_f);
}
public void IfByte(Action<byte> byte_f)
{
    Match(byte_f, f => { });
}
public void IfFloat(Action<float> float_f)
{
    Match(b => { }, float_f);
}
public R Switch<R>(R byte_v, R float_v)
{
    Match(b => byte_v, f => float_v);
}
public bool IsByte()
{
    Switch(true, false);
}
public bool IsFloat()
{
    Switch(false, true);
}

Apr 29 '22 02:04 Zoybean

Some random thoughts:

Perhaps for a first version, the user will be in charge of (unsafely) determining the size of the discriminant-plus-padding

You can do something similar already if you implement CTypeInfo yourself, and I've tried it a few times when the proc macros were less advanced. I also have distinct memories of segfaults in the days after while refactoring structs. Specifying alignments manually via attributes might be 'closer' (in terms of lines of code), but I'm almost certain I wouldn't want to use those in a production API.

Possibly, n-ary variants could be treated the same as unary variants with a struct element? I don't know if these are guaranteed to work that way with repr(C), that will need some investigation.

I agree this needs some investigation, probable candidate is #[repr(transparent)]. That said, I have faint memories of repr(transparent) warnings w.r.t. composability, but I'd have to look those up in the unsafe guidelines.

Unfortunately, though I don't know how many languages other than C# allow for partial classes

I would look at each language in total isolation. The most important point is Rust -> C is sane (sound and reasonable). Then it's just up to each other language to have 'nice enough' FFI utils.

About your C# proposal, I'd have to play, but I think your IfXXX is the way to go. Probably returning this instead so you can chain the calls, with some extra AssertXXX (or so) that return R or throw.

Apr 30 '22 21:04 ralfbiedert

If i'm not mistaken, all interoptopus types are required to be #[repr(C)] which means it's safe to calculate the field offsets. See Layout::extend's example and the Type layout section of the rust guide

Jan 14 '23 11:01 Waelwindows

Noting for anyone who implements this for the C# backend (possibly me), the F# language spec has a section on the compiled form of F#'s own discriminated unions in other .NET languages, including C#. This seems like a good form to mimic in interoptopus' C# backend. In the current latest (F# 4.1) this is section 8.5.4 "Compiled Form of Union Types for Use from Other CLI Languages".

Apr 11 '23 01:04 Zoybean