scylla-rust-driver icon indicating copy to clipboard operation
scylla-rust-driver copied to clipboard

Add `flatten` attribute to derive SerializeRow

Open nrxus opened this issue 11 months ago • 19 comments

This is similar to the flatten attribute in serde.

This PR adds support for both the match_by_name and the enforce_ordering flavors but it does not allow these structs to be mix-and-matched. This means that structs of different flavors of serialization cannot be flattened into one another. This is a feasibility limitation as these two methods of serialization are completely at odds with each other and hence cannot be combined. The error produced if these two flavors are matched will be at compile time but it may not be the clearest error since it would be about the struct not implementing some doc hidden trait they wouldn't be able to see in the docs.

I have only added this attribute to SerializeRow because it was easier than DeserializeRow but I also want to add it to that macro in a future PR next chance I get to dig into this code.

All the new traits/structs/enums needed for this change are inside the _macro_internal subdmodule such that no new public API is exposed. Maybe in the future those could be made public but it felt too early to know if all the signatures were exactly how we wanted to expose them or not.

For context, I am currently dealing with an issue that if I have different insert queries where one sets N columns and another one sets the same N and one extra, then I have two make two structs with N repeated fields. With this PR I'd be able to to instead flatten the struct with N fields inside the other struct to make my code more maintainable.

By name serialization

ser::row::ByName

A new internal-only struct ser::row::ByName is added that wraps a struct that implements a new trait: SerializeRowByName. This new type has a single function ser::row::ByName::serialize and attempts to serialize an entire RowSerializationContext, returning an error if any of the columns in the context were not serialized or do not belong to the struct. This is basically the implementation of SerializeRow::serialize for any struct that implements SerializeRowByName but split into its own internal-type so that the macro doesn't have to create this shared code. This couldn't be added as a default implementation in one of our traits because we need to call for some functions using Self as a generic parameter which caused some compilation errors.

SerializeRowByName

When deriving SerializeRow using the match_by_name flavor the struct will also implement a new internal-only trait: SerializeRowByName. This trait has a single type associated type Partial, and a function partial() that creates it. The partial struct has 3 main parts:

  1. For every field that is a column (will not be flattened): A reference to the field.
  2. For every field that is a nested struct (will be flattened) The partial view of that nested struct. This means that the nested struct must also implement: SerializeRowByName such that partial() can be called on it.
  3. A hashset to check that every field (not column) has been serialized. For fields that are columns we do this via their column-name. For nested structs, we do this via the field name of the nested struct.

The partial struct is required to implement a new trait PartialSerializeRowByName

PartialSerializeRowByName

PartialSerializeRowByName has two required functions:

  • serialize_field: takes the spec of a single column and attempts to serialize the corresponding field to it. If this column does not belong to this partial struct then the caller is told that the column is unused so that the caller can instead try to use a different field for this same column (i.e., when testing to see if any nested structs can serialize to that column). If the column is used, then a check is done to see if that column has completed the serialization of this field so that it can remove it out of its missing set. The caller is informed if that column has finished the serialization of this partial struct or not.

  • check_missing: consumes the partial struct while checking if all the fields in this struct were serialized, returning an error if not. This is used inside ser::row::ByName::serialize to verify that the a struct has been fully serialized. If a field has not finished serializing and the field is a nested struct (i.e., not just a column) then we should get the error from the nested struct instead for better error messaging.

To do this signaling, a new internal-only enum ser::row::FieldStatus was added that returns whether a column was used for the field, was used and completed the field, or was used by the field is still missing more columns.

By order serialization

ser::row::InOrder

A new internal-only struct ser::row::InOrder is added that wraps a struct that implements a new trait: SerializeRowInOrder. This new type has a single function ser::row::InOrder::serialize that attempts to serialize an entire RowSerializationContext, returning an error if any of the columns in the context were not serialized or do not belong to the struct. It does this by:

  1. Wrapping an iterator over the columns in the context around a new struct ser::row::ByColumn.
  2. Calling for the generated SerializeRowInOrder implementation for the struct we are deriving SerializeRow for using the ser::row::ByColumn instance.
  3. Verifying that the the ser::row::ByColumn instance was fully consumed.

This is basically the implementation of SerializeRow::serialize for any struct that implements SerializeRowInOrder but split into its own internal-type so that the macro doesn't have to create this shared code. This couldn't be added as a default implementation in one of our traits because we need to call for some functions using Self as a generic parameter which caused some compilation errors.

ser::row::ByColumn

ser::row::ByColumn wraps an iterator over column specs and provides the following methods:

  • next: Given a value to serialize it type and name checks it against the next column spec in the iterator, serializing it if successful or returning an error therwise

  • next_skip_name: Given a value to serialize it type checks (but skips name check) it against the next column spec in the iterator, serializing it if successful or returning an error therwise

  • finish: verifies that the iterator is fully consumed.

SerializeRowInOrder

When deriving SerializeRow using the enforced_ordering flavor the struct will also implement a new internal-only trait: SerializeRowInOrder. This trait has a single method serialize_in_order() whose generated implementation will:

  1. For every field that is a column (will not be flattened): It will call for for next or next_skip_name on the given ser::row::ByColum instance.
  2. For every field that is a nested struct (will be flattened), then it will call for serialize_in_order on it (implying the nested struct must also implement SerializeRowInOrder and pass along its ser::row::ByColum instance.

Note that this method does not call for finish() on ser::row::ByColumn because it does not need to verify that the iterator was fully consumed as it could have been called during flattening and we only want to verify that the iterator is consumed on the root struct being serialized.

Pre-review checklist

  • [X] I have split my patch into logically separate commits.
  • [X] All commit messages clearly explain what they change and why.
  • [X] I added relevant tests for new features and bug fixes.
  • [X] All commits compile, pass static checks and pass test.
  • [X] PR description sums up the changes and reasons why they should be introduced.
  • [X] I have provided docstrings for the public items that I want to introduce.

nrxus avatar Dec 07 '24 09:12 nrxus

cargo semver-checks found no API-breaking changes in this PR. Checked commit: c5470a0910f9a94cff6d92837638b872bf15ab17

github-actions[bot] avatar Dec 07 '24 09:12 github-actions[bot]

I apologize about the multiple force pushes yesterday, I was chewing on it yesterday a bit more since it wasn't reviewed yet and decided to move some stuff around to make it more clear that all the new structs are internal to the macro implementation only (by moving it to that sub-module). All that should be done by now.

I have also started work on adding the flatten attribute for the enforce_order flavor as well but in a separate branch. Let me know if you have a preference on whether to put that as its own PR or push it here so that the entire support for #[flatten] for SerializeRow is all on in one PR.

nrxus avatar Dec 09 '24 18:12 nrxus

I have finished the work to also support flattening when serializing with flavor = "enforced_order". It's on a separate branch though since I wasn't sure if this PR was already considered too big or not. Let me know if you'd like me to merge it into here so you can review it all as one piece or if I should just wait until this is reviewed and merged as the changes are already large.

nrxus avatar Dec 10 '24 21:12 nrxus

@wprzytula would you mind taking a look at this PR and tell me if you would like me to keep it as-is or add the flatten support to the enforced_order flavor as well in this PR?

nrxus avatar Dec 12 '24 02:12 nrxus

@Lorak-mmk , could you take a look and let me know if there are any concerns holding this PR?

nrxus avatar Dec 17 '24 21:12 nrxus

@nrxus We're sorry for poor responsivity on our side. We're busy with next year planning; we'll be able to look at your PR later.

wprzytula avatar Dec 18 '24 17:12 wprzytula

This is a significant but breaking change, so we most likely won't be able to attend to it before releasing 1.0. We are quite busy with other work :(

Lorak-mmk avatar Dec 18 '24 18:12 Lorak-mmk

This is a significant but breaking change, so we most likely won't be able to attend to it before releasing 1.0. We are quite busy with other work :(

Are you sure? semver-checks disagrees with you:

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳 Checked commit: 4184a7b

wprzytula avatar Dec 18 '24 20:12 wprzytula

I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute.

nrxus avatar Dec 18 '24 20:12 nrxus

I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute.

In such case, we will technically be able to release it in, say, 1.1, when we find time to review and accept this after we release 1.0. Does it sound OK to you, @nrxus ?

wprzytula avatar Dec 18 '24 20:12 wprzytula

I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute.

In such case, we will technically be able to release it in, say, 1.1, when we find time to review and accept this after we release 1.0. Does it sound OK to you, @nrxus ?

Yep sounds good! I'll just keep pointing to my branch for now. I also have a branch to do this same support but when serializing with order enforced. Should I just merge it here so you all only have to review it as one complete feature? It'd make the overall size of the PR bigger which is why I had kept it separate

nrxus avatar Dec 18 '24 20:12 nrxus

I made especially sure not to change any existing API, and to hide all of the new types/traits in the existing internal module to not increase the public API surface area other the new attribute.

In such case, we will technically be able to release it in, say, 1.1, when we find time to review and accept this after we release 1.0. Does it sound OK to you, @nrxus ?

Yep sounds good! I'll just keep pointing to my branch for now. I also have a branch to do this same support but when serializing with order enforced. Should I just merge it here so you all only have to review it as one complete feature? It'd make the overall size of the PR bigger which is why I had kept it separate

IMO let's have it in a single PR, separate commits.

wprzytula avatar Dec 18 '24 20:12 wprzytula

This is a significant but breaking change, so we most likely won't be able to attend to it before releasing 1.0. We are quite busy with other work :(

Are you sure? semver-checks disagrees with you:

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳 Checked commit: 4184a7b

I made a typo. I definitely meant that this is NOT a breaking change, and thus we will not prioritise the review before releasing 1.0.

Lorak-mmk avatar Dec 18 '24 20:12 Lorak-mmk

@Lorak-mmk congrats on the v1.0 release! 🎉 . I have rebased this PR against the main branch so it should be good to be reviewed whenever

nrxus avatar Mar 04 '25 22:03 nrxus

This looks really useful, but wondering if the flatten attribute can also be added to derive SerializeValue, which I'm using for a bunch of UDTs I have in my app?

I have a struct that's used independently in some parts of my code, which I include in my main struct that essentially derives SerializeValue in order to implement a UDT.

blablacio avatar Mar 11 '25 17:03 blablacio

This looks really useful, but wondering if the flatten attribute can also be added to derive SerializeValue, which I'm using for a bunch of UDTs I have in my app?

I have a struct that's used independently in some parts of my code, which I include in my main struct that essentially derives SerializeValue in order to implement a UDT.

After this is merged I may take a look at it whenever I have time but since I don't use UDTs and this PR is already pretty large I would be hesitant to adding more features as part of this PR.

nrxus avatar Mar 16 '25 21:03 nrxus

🚀 🎉 🦀 This is one of the highest quality external contributions I've ever reviewed! Great job!

Thank you for the compliment and the review! (:

I have addressed all the simple comments but there are a couple of remaining questions, let me know when those are decided and I'll make any necessary changes.

nrxus avatar Mar 16 '25 21:03 nrxus

@wprzytula are there any other changes necessary for this PR?

nrxus avatar Mar 25 '25 20:03 nrxus

@wprzytula are there any other changes necessary for this PR?

Sorry for the delay, I've had a large number of reviews recently. I'm going to review it today or tomorrow.

wprzytula avatar Mar 26 '25 08:03 wprzytula

@wprzytula is there anything needed for merging?

nrxus avatar Apr 04 '25 16:04 nrxus

@wprzytula is there anything needed for merging?

I'd like to also review this. I'll try to get to it shortly.

Lorak-mmk avatar Apr 06 '25 14:04 Lorak-mmk

I know this PR is pretty big so I really appreciate the time and effort you are all taking in reviewing this and giving me some great feedback (:

nrxus avatar Apr 08 '25 16:04 nrxus

Oh, actually one more thing: in the previous review I proposed using https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-diagnosticon_unimplemented-attribute to improving error message when the user tries to use #[flatten] on a field that does not support it. I think you may have missed that because I posted that in the review summary, instead of a separate comment.

Lorak-mmk avatar Apr 08 '25 18:04 Lorak-mmk

Oh, actually one more thing: in the previous review I proposed using https://doc.rust-lang.org/reference/attributes/diagnostics.html#the-diagnosticon_unimplemented-attribute to improving error message when the user tries to use #[flatten] on a field that does not support it. I think you may have missed that because I posted that in the review summary, instead of a separate comment.

Ah, that's cool, I've never used that before. Do you have any ideas for a good error message?

For context, the current error for having an outer by-name serialization and an inner in-order serialization looks like:

error[E0277]: the trait bound `serialize::row::tests::foobar::Inner: SerializeRowByName` is not satisfied
    --> scylla-cql/src/serialize/row_tests.rs:1220:14
     |
1220 |     #[derive(SerializeRow)]
     |              ^^^^^^^^^^^^ the trait `SerializeRowByName` is not implemented for `serialize::row::tests::foobar::Inner`
     |
     = help: the following other types implement trait `SerializeRowByName`:
               &T
               RowWithTTL
               TestRowWithColumnRename
               TestRowWithColumnSorting
               TestRowWithGenerics<'a, T>
               TestRowWithNoColumns
               TestRowWithSkippedFields
               serialize::row::tests::foobar::Outer
             and 3 others
     = note: this error originates in the derive macro `SerializeRow` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `partial` found for struct `serialize::row::tests::foobar::Inner` in the current scope
    --> scylla-cql/src/serialize/row_tests.rs:1220:14
     |
1216 |     struct Inner {
     |     ------------ method `partial` not found for this struct
...
1220 |     #[derive(SerializeRow)]
     |              ^^^^^^^^^^^^ method not found in `Inner`
     |
     = help: items from traits can only be used if the trait is implemented and in scope
note: `SerializeRowByName` defines an item `partial`, perhaps you need to implement it
    --> scylla-cql/src/_macro_internal.rs:41:1
     |
41   | pub trait SerializeRowByName {
     | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     = note: this error originates in the derive macro `SerializeRow` (in Nightly builds, run with -Z macro-backtrace for more info)

Some errors have detailed explanations: E0277, E0599.
For more information about an error, try `rustc --explain E0277`.
error: could not compile `scylla-cql` (lib test) due to 5 previous errors

The inverse case looks like:

error[E0277]: the trait bound `serialize::row::tests::foobar::Inner: SerializeRowInOrder` is not satisfied
    --> scylla-cql/src/serialize/row_tests.rs:1220:14
     |
1220 |     #[derive(SerializeRow)]
     |              ^^^^^^^^^^^^ the trait `SerializeRowInOrder` is not implemented for `serialize::row::tests::foobar::Inner`
     |
     = help: the following other types implement trait `SerializeRowInOrder`:
               &T
               TestRowWithColumnRenameAndEnforceOrder
               TestRowWithEnforcedOrder
               TestRowWithSkippedNameChecks
               serialize::row::tests::foobar::Outer
               serialize::row::tests::test_flatten_row_serialization_with_enforced_order_and_skip_namecheck::InnerColumnsOne
               serialize::row::tests::test_flatten_row_serialization_with_enforced_order_and_skip_namecheck::InnerColumnsTwo
               serialize::row::tests::test_flatten_row_serialization_with_enforced_order_and_skip_namecheck::OuterColumns
             and 6 others
     = note: this error originates in the derive macro `SerializeRow` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0277`.
error: could not compile `scylla-cql` (lib test) due to 1 previous error

nrxus avatar Apr 08 '25 19:04 nrxus

Ah, that's cool, I've never used that before. Do you have any ideas for a good error message?

For context, the current error for having an outer by-name serialization and an inner in-order serialization looks like:

I took a stab at writing those messages, but I see you were faster. My commits are available at: https://github.com/Lorak-mmk/scylla-rust-driver/tree/flatten-serialize-name I'm not sure which messages are better, mine or yours. I'll let you and others (cc @wprzytula @muzarski ) decide.

Here's my proposed error message for the first case (match_by_name). The second error probably exists only because the call is not fully qualified, right? Let's definitely get rid of unqualified calls.

cargo check
    Blocking waiting for file lock on build directory

    Checking scylla-cql v1.1.0 (/home/karolbaryla/repos/scylla-rust-driver/scylla-cql)
error[E0277]: `InnerStruct` cannot be flattened here
  --> scylla-cql/src/lib.rs:27:10
   |
27 | #[derive(SerializeRow)]
   |          ^^^^^^^^^^^^ `InnerStruct` is not a struct that derives `SerializeRow` with `match_by_name` flavor
   |
   = help: the trait `SerializeRowByName` is not implemented for `InnerStruct`
   = note: There are two common reasons for that:
           - `InnerStruct` does not use `#[derive(SerializeRow)]`
           - `InnerStruct` uses `#[scylla(flavor = "enforce_order")]`
   = help: the trait `SerializeRowByName` is implemented for `OuterStruct`
   = note: this error originates in the derive macro `SerializeRow` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `partial` found for struct `InnerStruct` in the current scope
  --> scylla-cql/src/lib.rs:27:10
   |
23 | struct InnerStruct {
   | ------------------ method `partial` not found for this struct
...
27 | #[derive(SerializeRow)]
   |          ^^^^^^^^^^^^ method not found in `InnerStruct`
   |
   = help: items from traits can only be used if the trait is implemented and in scope
note: `SerializeRowByName` defines an item `partial`, perhaps you need to implement it
  --> scylla-cql/src/_macro_internal.rs:48:1
   |
48 | pub trait SerializeRowByName {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   = note: this error originates in the derive macro `SerializeRow` (in Nightly builds, run with -Z macro-backtrace for more info)

Some errors have detailed explanations: E0277, E0599.
For more information about an error, try `rustc --explain E0277`.
error: could not compile `scylla-cql` (lib) due to 5 previous errors

And here's the message for enforce_order case.

cargo check
    Checking scylla-cql v1.1.0 (/home/karolbaryla/repos/scylla-rust-driver/scylla-cql)
error[E0277]: `InnerStruct` cannot be flattened here
  --> scylla-cql/src/lib.rs:35:10
   |
35 | #[derive(SerializeRow)]
   |          ^^^^^^^^^^^^ `InnerStruct` is not a struct that derives `SerializeRow` with `enforce_order` flavor
   |
   = help: the trait `SerializeRowInOrder` is not implemented for `InnerStruct`
   = note: There are two common reasons for that:
           - `InnerStruct` does not use `#[derive(SerializeRow)]`
           - `InnerStruct` uses `#[scylla(flavor = "match_by_name")]` (which is the default)
   = help: the trait `SerializeRowInOrder` is implemented for `OuterStructOrdered`
   = note: this error originates in the derive macro `SerializeRow` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0277`.
error: could not compile `scylla-cql` (lib) due to 1 previous error

Lorak-mmk avatar Apr 08 '25 21:04 Lorak-mmk

I took a stab at writing those messages, but I see you were faster. My commits are available at: https://github.com/Lorak-mmk/scylla-rust-driver/tree/flatten-serialize-name I'm not sure which messages are better, mine or yours. I'll let you and others (cc @wprzytula @muzarski ) decide.

I am perfectly happy with your version so I just did that but I removed the do_not_recommend diagnostic attributes because those were only introduced in 1.85 and I think you all have a 6-month MSRV cycle so that'd be too recent.

Also, you were right, once I switched to fully qualifying the path then there is only one error for the first case as expected:

error[E0277]: `serialize::row::tests::foobar::Inner` cannot be flattened here
    --> scylla-cql/src/serialize/row_tests.rs:1220:14
     |
1220 |     #[derive(SerializeRow)]
     |              ^^^^^^^^^^^^ `serialize::row::tests::foobar::Inner` is not a struct that derives `SerializeRow` with `match_by_name` flavor
     |
     = help: the trait `SerializeRowByName` is not implemented for `serialize::row::tests::foobar::Inner`
     = note: There are two common reasons for that:
             - `serialize::row::tests::foobar::Inner` does not use `#[derive(SerializeRow)]`
             - `serialize::row::tests::foobar::Inner` uses `#[scylla(flavor = "enforce_order")]`
     = help: the following other types implement trait `SerializeRowByName`:
               RowWithTTL
               TestRowWithColumnRename
               TestRowWithColumnSorting
               TestRowWithGenerics<'a, T>
               TestRowWithNoColumns
               TestRowWithSkippedFields
               serialize::row::tests::foobar::Outer
               serialize::row::tests::test_row_serialization_nested_structs::InnerColumnsOne
             and 2 others
     = note: this error originates in the derive macro `SerializeRow` (in Nightly builds, run with -Z macro-backtrace for more info)

For more information about this error, try `rustc --explain E0277`.
error: could not compile `scylla-cql` (lib test) due to 4 previous errors

nrxus avatar Apr 08 '25 21:04 nrxus

I see at least one other unqualified call. Give me a moment, I'll fix them and push to my branch.

Lorak-mmk avatar Apr 09 '25 09:04 Lorak-mmk

I've pushed some changes to my branch ( https://github.com/Lorak-mmk/scylla-rust-driver/tree/flatten-serialize-name ), feel free to use it.

  • Got rid of unqualified calls
  • Moved one formatting change to a different commit. Without that make static is not passing on one commit.

I have no more issues with this version, so if you decide to use it I can approve without another review.

Lorak-mmk avatar Apr 09 '25 10:04 Lorak-mmk

I have rebased my branch against yours such that this PR now has your changes. Let me know if that's enough.

nrxus avatar Apr 09 '25 16:04 nrxus

Thanks for the approval and merge! No worries about the many rounds, it was a big change so I appreciate thoroughness!

nrxus avatar Apr 09 '25 18:04 nrxus