rust-bindgen icon indicating copy to clipboard operation
rust-bindgen copied to clipboard

Using `syn` for code generation instead of `quote`

Open pvdrz opened this issue 3 years ago • 3 comments

This is a rather loose proposal but I'd like to start this discussion as it might provide a more straightforward way to handle issues like https://github.com/rust-lang/rust-bindgen/issues/564 and https://github.com/rust-lang/rust-bindgen/issues/1743. I've seen that most of the code generation is done relying on quote! which is a very straightforward and simple way to emit rust code.

However, tacking any issue that requires reordering or modifying the generated code would require several workarounds due to the unstructured nature of proc_macro2::TokenStream.

I was wondering if it would be feasible and useful use syn::parse_quote! and other tools provided by the syn crate for code generation in order to keep all the generated code in a more structured way before emitting it so bindgen can manipulate it in later stages in an easier way.

cc @amanjeev

Edit: I forgot to mention that one advantage of being able to handle all the items generated by bindgen as syn types is that we could implement "passes" in a modular way so we can easily enable/disable them. So for example, we could deduplicate extern "C" { ... } blocks in one pass and independently write another pass to reorganize items. In general, it would help a lot when having to handle generated code in my opinion.

Another important point is that we should be able to do this transition incrementally as we can always produce TokenStreams from the syn data structures by using the ToToken trait.

However, we think it's pretty important to know if this change would be worth it from the maintainers point of view or if a less intrusive way to solve the issues mentioned above would be better.

pvdrz avatar Aug 08 '22 19:08 pvdrz

we think it's pretty important to know if this change would be worth it from the maintainers point of view or if a less intrusive way to solve the issues mentioned above would be better.

@emilio surely knows a lot more about the tradeoffs than I do. Personally, I have only ever written ordinary macros, and no procedural ones, so my opinion should not carry much weight, but to the level that I understand this issue's proposal, it sounds worthwhile to me.

Another important point is that we should be able to do this transition incrementally

An incremental approach, using relatively small, reversible changes, is a strongly favorable thing for me.

kulp avatar Aug 09 '22 18:08 kulp

Hi @kulp thanks for you answer :)

An incremental approach, using relatively small, reversible changes, is a strongly favorable thing for me.

Yeah I think this is definitely possible, We could take functions and self-contained sections of code like this:

fn top_level_path(
    ctx: &BindgenContext,
    item: &Item,
) -> Vec<proc_macro2::TokenStream> {
    let mut path = vec![quote! { self }];

    if ctx.options().enable_cxx_namespaces {
        for _ in 0..item.codegen_depth(ctx) {
            path.push(quote! { super });
        }
    }

    path
}

and replace them by

fn top_level_path(
    ctx: &BindgenContext,
    item: &Item,
) -> Path {
    let mut segments = Punctuated::<PathSegment, Colon2>::new();

    segments.push(parse_quote!(self));

    if ctx.options().enable_cxx_namespaces {
        for _ in 0..item.codegen_depth(ctx) {
            path.push(parse_quote!(super));
        }
    }

    Path {
        leading_colon: None,
        segments,
    }
}

Then we can replace each use of top_level_path to either handle this as an actual Path (like when you want to write quote!(use #path)) or as a token stream (by calling path.into_token_stream()). This could be done in a contained and incremental manner in several places.

One aspect I'm not sure about is performance issues so we would also have to consider this.

pvdrz avatar Aug 09 '22 20:08 pvdrz

Can you elaborate what use cases do you envision for this? And why are they not addressable for example if you use parse_quote! on the output of bindgen?

emilio avatar Aug 10 '22 11:08 emilio

it would be definitely possible to use the output of codegen::codegen and doing parse_quote!(mod something { #( #items )* }), and then taking the contents of the module but it feels a bit hacky imo. Parsing each individual "item" in the output wouldn't be as straightforward because each "item" can be actually more than one item (For example, structs/enums and their impls are on the same "item").

pvdrz avatar Aug 10 '22 15:08 pvdrz

That being said, moving to syn completely could be used to do some of these fixes "earlier" in code generation. But for the two issues I mentioned above, we can just parse the codegen output and do it afterwards.

pvdrz avatar Aug 10 '22 15:08 pvdrz

Given that https://github.com/rust-lang/rust-bindgen/pull/2254 is close to being merged I'm marking this as solved

pvdrz avatar Aug 19 '22 15:08 pvdrz