capnproto-rust
capnproto-rust copied to clipboard
Refactor code generation
The code generation is very bespoke, and I think there's significant scope for simplifying this, but it would be quite a lot of work, so I wanted to gauge the appetite for this before I start.
Currently the code is generated as a formatted string, using a custom code-gen framework.
I propose generating an abstract syntax tree representation instead, and then formatting it at the end.
That would allow you to use rust's standard tooling (syn and quote) to generate the AST. You could then format using prettyplease or by shelling out to rustfmt
I agree that the capnpc could use a lot of refactoring. I don't think we need to bring in new dependencies, though. quote seems like overkill. We don't need to make an AST. We just need to print the generated code as a string.
This library already requires you to set up an environment where you can shell out to capnp so I don't see a point in adding inconvenience to avoid shelling out to rustfmt.
Automatically running the output through rustfmt sound great though. I think it could let you significantly simplify FormattedText. Maybe it could just be a Vec<String>?
It is difficult for the end-user to get a grip on the code generation process, partly because the to-be-interpolated code lacks visual clarity:
https://github.com/capnproto/capnproto-rust/blob/9136f83ac733cfc33351b6993e5a78b27f0b4667/capnpc/src/codegen.rs#L1497-L1510
Using a procedural macro which interpolates the variables (like for instance the one in the quote crate) would result in much shorter and much more readable code:
quote_like!{
#[inline]
pub fn which(self) -> ::core::result::Result<#concrete_type, #capnp::NotInSchema> {
match self.#field_name.get_data_field::<u16>(#doffset) {
#getter_interior
}
}
}
In principle (as per the documentation) users should have the flexibility to add functionality to the codegen (for instance with custom annotations) and in many cases this would require forking and modifying the existing codegen. I would therefore argue that the codegen needs to be especially transparent and comprehensible.
The usage of the quote library might initially appear as overkill (it is true that in principle no AST is required in in the output) but it can serve as a reliable intermediate data format, facilitating future development and maintainability. The quote library's documentation even provides a section on non macro code generation so it does seem like an acceptable use case.
Of course there might be a performance penalty and relying on another dependency would undoubtedly increase compile times.
Another option would be to write a custom procedural macro, similar to quote! which only performs the interpolation.