lone icon indicating copy to clipboard operation
lone copied to clipboard

A type-safe representation of tagged unions

Open Hirrolot opened this issue 7 months ago • 3 comments

Hi there! I came from your Reddit post and found this project fascinating.

While reading the sources, I've found the header include/lone/types.h, which uses a common technique of tagged unions. Most notably, struct lone_value:

struct lone_value {
	// Auxiliary data...

	enum lone_type type;

	union {
		struct lone_module module;
		struct lone_function function;
		struct lone_primitive primitive;
		struct lone_list list;
		struct lone_vector vector;
		struct lone_table table;
		struct lone_bytes bytes;
		struct lone_pointer pointer;
		long integer;
	};
};

The downside of this approach is that it's possible to mess up during case analysis by 1) not checking a tag, or by 2) checking a tag A and then using B. The compiler cannot check this automatically, causing the bug to silently creep into the final executable.

Instead of managing tagged unions manually, I would suggest using Datatype99, which is a header-only library designed specifically to deal with the problem of tagged unions. struct lone_value would look as follows [^1]:

datatype(
    LoneValue,
    (LoneModule, struct lone_module),
    (LoneFunction, struct lone_function),
    (LonePrimitive, struct lone_primitive),
    (LoneList, struct lone_list),
    (LoneVector, struct lone_vector),
    (LoneTable, struct lone_table),
    (LoneBytes, struct lone_bytes),
    (LonePointer, struct lone_pointer),
    (LoneInteger, long)
);

And case-analyzed as follows:

void handle(LoneValue value) {
    match(value) {
        of(LoneModule, module) { /* ... */ }
        of(LoneFunction, function) { /* ... */ }
        of(LonePrimitive, primitive) { /* ... */ }
        of(LoneList, list) { /* ... */ }
        of(LoneVector, vector) { /* ... */ }
        of(LoneTable, table) { /* ... */ }
        of(LoneBytes, bytes) { /* ... */ }
        of(LonePointer, pointer) { /* ... */ }
        of(LoneInteger, integer) { /* ... */ }
    }
}

Since of explicitly provides a variable binding, such as module or function, it's now much harder to make a mistake. The datatype encoding is also more concise than the corresponding tagged union representation, since the former defines both enum lone_type and struct lone_value [^2].

Since Datatype99 has no run-time dependencies (not even the C standard library), and has a transparent and formally specified semantics, I think it might be a great fit for Lone [^3].

Let me know if you have any thoughts or questions, which I should be able to answer.

[^1]: Auxiliary data can be added as a separate structure. [^2]: Although both types can be manipulated separately under the names LoneValueTag and LoneValue. [^3]: Other tagged unions in the project can be rewritten in the same way, if there are any.

Hirrolot avatar Dec 02 '23 09:12 Hirrolot