wit-bindgen Clean up some internals

I've cleaned up some idiosyncrasies of how wit-bindgen represents types, and also some other things surrounding that.

The main thing that I've changed in this PR is to distinguish between 'named' and 'anonymous' types. Named types are proper types defined by the user (record, variant, etc.) whereas anonymous types are types defined via generics (option<T>, expected<T, E>, etc.).

Previously, these were represented with a single TypeDef type, with an optional name field. This led to some weird edge cases of types:

A named type with no name - illegal.
An anonymous type with a name - this was interpreted as a redundant way of defining a type alias.

I replaced TypeDef with an enum CustomType which can be either a named or anonymous type.

I ran into one major issue with this change, which was that plain resource types were implemented by creating a nameless type alias to a handle to that resource, and then storing it in the type lookup table. I worked around this by instead making the type lookup code check both the type and resource lookup tables directly.

A couple of other little changes sprouted off of this:

I renamed type aliases from NamedTypeKind::Type to NamedTypeKind::Alias, since I think that's clearer.
The markdown backend had a strange skip_name option in its print_ty method, which basically just caused it to panic if the printed type was a named type. It was set to true when printing an aliased type. I'm not sure what exactly it was trying to do, but I removed it.
Generator had methods for defining anonymous types, except that they took non-optional names, and so they just acted as an unnecessary extra way to define type aliases. I replaced them with methods which actually define nameless anonymous types.

The only language which doesn't already have those types built in and needs to define them is C. It was previously keeping track of them manually, so I reworked the C backend on top of the new methods.

This did lead to a slight regression in its behaviour, though - the manual system would also keep track of which types were actually used, and not include the unused ones. For example, a function returning expected<string, errno> creates a C function which returns an errno and has an out-pointer for a string. This doesn't actually use the expected type itself, and so it would previously have been removed. With this change, it isn't, but that's already the case with lots of the other backends anyway, so I think that's fine; I'm okay to revert that bit if you think it matters though.

Jun 05 '22 11:06 Liamolucko

Thanks for the work here! I think it might be good to get some thoughts from others on this change as well at a high level.

My original intention with the wit-parser library was to have the AST pretty closesly match the AST of components themselves where types can have optional names. This is however a significant deviation from that where some types are blessed to not need names but other types require them. This hypothetically means that taking an arbitrary component off the shelf and generating bindings for it is more difficult since we'd be left to bridge the gap.

On the other hand though this reflects the AST structure of *.wit as-is today where there are clearly defined anonymous types and ones that are required to have names. As can be seen here this removes a lot of panics internally from the code generators which aren't ever hit by statically ruling out the cases (and this is really nice!).

Perhaps the best way to approach this is to take the naming issue upstream to the component-model repository? I know that the intention is that *.wit should be one-to-one with the component model and restrictions we want here are also restrictions in the component model binary format itself (e.g. layers on validation). It might be good to settle this upstream around the naming issue and reach a consensus there before refactoring internally here?

As a side note though each individual language will still need to deal with whatever semantics we settle on. C for example I think still needs the code it internally has for public/private types since the C API differs from the listed interface types API such as results getting "exploded" and the T becomes an out-ptr with the E being returned by-value. Additionally as you've seen keeping private types private rather than generating them in public headers I think is important and ideally not something we should gloss over.

Jun 07 '22 13:06 alexcrichton

This hypothetically means that taking an arbitrary component off the shelf and generating bindings for it is more difficult since we'd be left to bridge the gap.

Practically all languages have this requirement anyway, so I think it's fine to validate it at the WIT level rather than making per-language bindings do it manually.

Perhaps the best way to approach this is to take the naming issue upstream to the component-model repository? I know that the intention is that *.wit should be one-to-one with the component model and restrictions we want here are also restrictions in the component model binary format itself (e.g. layers on validation). It might be good to settle this upstream around the naming issue and reach a consensus there before refactoring internally here?

I don't really think this should be a fundamental part of the component model, that would be an odd limitation.

Also, this isn't the only place that WIT has extra requirements on top of the component model - it also requires that identifiers be kebab-case.

C for example I think still needs the code it internally has for public/private types since the C API differs from the listed interface types API such as results getting "exploded" and the T becomes an out-ptr with the E being returned by-value. Additionally as you've seen keeping private types private rather than generating them in public headers I think is important and ideally not something we should gloss over.

Okay, I've changed the C backend back to how it was.

Jun 11 '22 08:06 Liamolucko

To clarify I think that the component model and wit should be aligned, or at least that's the intention. If wit requires something thing it should also be required by the component-model and vice-versa. That was at least the original intention with what's going on here. I'm sorry in that I don't want to blow up this PR too much but I do feel that it might be good to settle this issue with the component model spec before making changes here. Effectively I think it would be good to get agreement on what the expectations are for taking a component off-the-shelf and generating reasonble bindings for it. If that requires changes to the component model I think it would be good to get agreement on the rough shape of the changes before making changes here in wit-bindgen (to avoid this refactoring and chasing different goals at different points in time)

Jun 13 '22 14:06 alexcrichton

On the note of component model alignment, we've proposed standardizing WIT with the component model: https://github.com/WebAssembly/WASI/issues/484

Jun 17 '22 19:06 ricochet

I'm going to close this PR as it's pretty outdated at this point and lots has changed in the interim. I think that some of these refactorings may still be worthwhile but unfortunately I believe that they'd need to be restarted from scratch given all the changes in the interim.

Jan 27 '23 16:01 alexcrichton

wit-bindgen wit-bindgen copied to clipboard

Clean up some internals

wit-bindgen
wit-bindgen copied to clipboard