Add serde-style `brw(bound)` attribute
This PR adds a serde-style brw(bound) attribute. Serde also has a page further describing when this is useful: https://serde.rs/attr-bound.html. I brought this up in the Discord the other day and decided to quickly implement it as a proof of concept to better explain my thinking.
This is only a draft because I still have some unanswered questions. First, it doesn't really make sense for this attribute to be allowed on unit structs since they cannot have generic parameters, but struct and unit struct share the same Attr struct, unlike enum and unit only enum. Should I split this into two Attr structs? Is bound the only attribute that would be on struct and not unit struct, or should any other attributes be removed from the unit struct Attr struct after the split?
Second, right now, there are no default bounds added if the bound attribute is not present. I am worried that adding default inferred bounds will break existing code or complicate the implementation.
Finally, I'll wait to add more tests and docs until we have settled on the details.
Just to make sure I understand, this is so that a struct can have an unbounded generic type whilst still being able to derive the binrw implementation by passing the impl where clause through this new directive?
That is correct. I will often have a container struct that I want to be able to work with a generic type that impls read and/or write. Right now since the bounds have to go on the container struct every concrete type I use needs to impl both read and write or it won't work even if I only need one. With this new attribute since the bounds are on the impls and not the container if the concrete type impls read but not write then the container will be able to read with it and if a type implements both then it can do both.
I also sometimes need bounds to call trait functions in calc or map that I don't otherwise want on the container struct since it's overly restrictive if I don't care about using binrw with that type. This is why even if there are good inferred bounds it would be nice to have the attribute to override them.
Another thing I just thought of that needs to be figured out: I'm not sure how to handle the magic and endian traits. Should the bounds from the attribute be used there too?
First, it doesn't really make sense for this attribute to be allowed on unit structs since they cannot have generic parameters, but struct and unit struct share the same Attr struct, unlike enum and unit only enum. Should I split this into two Attr structs?
I would not split this. A unit struct is just a struct with less stuff, whereas a unit enum has some actually different features.
Second, right now, there are no default bounds added if the bound attribute is not present. I am worried that adding default inferred bounds will break existing code or complicate the implementation.
It would be somewhat non-trivial, since e.g. struct Foo<T> { #[br(calc(…))] t: T } does not necessarily require T to be BinRead, and BinWrite needs some different logic, but I’m not sure the automatic derivation would be too difficult to get right. I think the algorithm is just:
- If a
whereclause exists, just use that; otherwise - For each field with a generic type: a. If read, and no calc/map/default/parse_with, add the predicate; b. If write, and no write_with, add the predicate.
Right? And if someone expects a T used as an input to map/parse_with/etc. to have a binrw trait, then they have to specify that with the bound directive.
I don’t feel very concerned about breaking existing code by implementing this, especially since the bound directive ensures there is always an escape hatch even if the first implementation is bugged somehow. My guess is folks using generics already are probably just specifying the bounds for the whole struct.
Another thing I just thought of that needs to be figured out: I'm not sure how to handle the magic and endian traits. Should the bounds from the attribute be used there too?
I would not, unless those traits can’t compile without it.
Oh, and for T::Args I think it would need to be:
- If no args/args_raw, specify
for<'a> T::Args: Default; otherwise - Emit a diagnostic that “`args` on generic fields requires `bound`”, since proc-macros do not have access to type information so can’t determine the types passed to
argsto generate a concrete type bound.
Thanks for the feedback. That all sounds great to me. Serde has some complicated logic for walking the AST to find uses of generic type parameters, but the predicate generation doesn't seem so bad. T::Args has the potential to complicate things, but if we require hand-written bounds when necessary, like you suggested, then hopefully it's manageable. I'll let you know when I have something else to review.
I've hit a wall deciding how to handle the validation. Right now, I do the whole AST walk and decide which generic types need to have where clauses generated in the codegen step, well after validation, which means I don't have all the information needed to generate the diagnostic for args without bounds during the validation step. I could do a lighter but similar pass during validation, which would duplicate work somewhat. Or I could do the full pass before validation and use the same result for validation and codegen. Do you have any suggestions for what would best fit the binrw architecture here?
Also, I'm largely happy with the code that is in my fork right now, if you wanted to give it a quick look over, understanding that it may still change.
Thanks for working on this! I will have a closer look at the code in a bit to ensure I have a better shared context for the discussion, but just replying quickly to your last comment, what cases fail by simply matching each field’s ty to the generic parameters such that an AST walk is required? The only one I can think of offhand is if a generic type is used within another generic (e.g. field: Vec<T>) would require walking to discover T being used in that position. Is there something else?
Vec<T> is a good example. Some more examples of fields using generic type parameters can be found in this struct I've been playing with:
trait Wrapper<T> {
type Item;
}
struct LotsOfTypes<A, B, C: Wrapper<D>, D, E, F, G> {
a: A,
b: Vec<B>,
c: <C as Wrapper<D>>::Item,
e: [E; 4],
f: (F, G),
}
And that is only scratching the surface of what Rust allows you to do. I'm trying to mirror Serde's behavior as closely as possible since I don't want to break anyone's intuition, given how similar the feature is to the one found in Serde.
Sorry about the noise from CI, I’ve rebased the branch to fix the failures and am going to see how far I can get on reviews tonight.
Thank you for putting so much thought into this. I've started working to incorporate your feedback.
Regarding the field-level attribute, I added it because serde has it. It allows you to override the inference for one field without having to explicitly specify the bounds that would have been inferred for the other generic types used in other fields. Without the field-level attribute, if you have a struct with many generic types and the bounds are mostly inferred correctly except for one field, you would have to specify all the bounds for all generic types, even the ones that would have been inferred correctly, at the top of the struct. And anyone reading the code would have to read through all the explicitly specified bounds and not miss the one that is different from what would have been inferred.
I'd vote to keep the field-level attribute because:
- serde has it, and this feature is based closely enough on serde's bound attribute that not having it would be a violation of the principle of least astonishment in my eyes.
- As someone who writes a lot of generic structs, I can see why it might be useful in keeping code concise and easier to read.
- It doesn't add much complexity to the implementation.
- Users have the option to ignore it and put their bounds at the top of the struct/enum anyway if they want.
That said, if you disagree, I can remove it.
The argument for avoiding redundancy is persuasive, it just seems to me like a weird way to do it because a generic parameter and a field aren’t one-to-one, so when the same type is used on multiple fields, then what? I would think of solving the problem of “I want to keep some of the inferred bounds” either by T: infer in the bound directive, or by putting the annotations on the parameters rather than the fields. I understand the principle of least surprise and I guess I can’t say that this alternative suggestion is not so clearly better that it is worth doing something different, but I also just loathe importing designs with what seem to be kind of obvious deficiencies just because the other thing is popular.
I did try to do a survey of generics use on GitHub to decide how much I should care by running searches for lang:rust /(?:struct|enum)\s+\w+<([^>]+,){N}[^,>]+>\s+\{/ where N=0 to 5,. I found 96.9% of structs/enums use 3 or fewer total parameters and 99.4% use four or fewer. This includes lifetimes and const parameters because search complexity limits meant I was unable to filter these. So, the number of users who would be harmed or benefitted from whatever choice is made is probably very low.
It can always be changed later so I’m not going to make you change or remove it unless I’ve absolutely thrilled you with my alternative syntax suggestion and/or sus data analysis. :-) Thanks for working on this!
Finally found some time to pick this up again. It should be more in line with what you were talking about now. I wanted to get some more feedback from you about this direction before I started adding a bunch of tests. I removed the field-level and enum variant bounds for the time being, since it was getting way too complicated to implement with this new architecture, and you were against it.
One annoying thing is the args error. It will break existing correct code just because it doesn't use the bound attribute. One thing that we could do is try to parse the generic bounds to see if the type already has a GAT Args bound and not generate bounds or give an error for it. This would probably still break some existing code, and it could allow some broken code through and lead to bad error messages.
Also, since we would be adding the Default bound to all the generated args predicates, this could break existing code that has explicitly written out bounds for args that do not require Default. But we could do the check for Args described above and avoid generating bounds to fix this in most cases.
Even just adding the implicit BinRead/BinWrite bounds has the potential to break stuff. We could avoid the implicit bounds on structs/enums with any already specified, which would be annoying for new code because most structs with generic type parameters are going to have a where clause, so you'd end up being required to write the bound attribute out pretty much every time. And even then, you would still break something that intentionally did not bind the type to those traits.
It seems like the only way to do this without a major version bump is to leave out the implicit stuff. If this got merged as is, all of my binrw projects would fail to compile. I really don't feel good about that. Maybe we should hold off the implicit stuff until there are more reasons for a major version bump. Sorry for rambling, I'm interested to hear your feelings about all this.
P.S. I think the failing ui tests are because of changes to the output in nightly and not because of my changes.
I am currently over-encumbered and don’t have a timeline for when I will be able to review next. Maybe someone else can peek? ? ? ????? ?
P.S. I think the failing ui tests are because of changes to the output in nightly and not because of my changes.
Yes, that is how it goes. Someday proc_macro_span will stabilise and the UI tests can stop using nightly and therefore maybe change a little less often, possibly… :-)