Redesign `DeserializerAdapter` trait to support deserializing using `DeserializeSeed`
Since the CDR format is not self-describing, the target type needs to be known for deserialization. The current RustDDS implementation ensures this by requiring a serde::Deserialize implementation for all target types. Unfortunately this makes it impossible to deserialize when the type information is only available at runtime. For example, it is not possible to base the deserialization on type information parsed from ROS2 msg files at runtime.
This pull request solves this limitation by adding basic support for deserialization based on serde's DeserializeSeed trait, which allows to use additional state for deserialization.
The main changes are:
- Add a new
Decodetrait for deserializing a byte slice with access to theRepresentationIdentifier. The trait is implemented for the following new types:-
PlCdrDeserializeris an empty newtype that implementsDecodefor all types that implementPlCdrDeserialize. -
CdrDeserializeDecoderis an empty newtype that implementsDecodefor all types that implementserde::Deserialize. -
CdrDeserializeSeedDecoderwraps aDeserializeSeedimplementation and implementsDecodeusing the seed for deserialization.
-
- Redesign the
DeserializerAdaptertrait- Add a
from_bytes_withmethod that takes an additionaldecoder: impl Decodeargument. The method has a default implementation that calls the decoder. - Add a new associated type named
Deserialized, which describes the type that the decoder will deserialize to.- Most of the time, the type is equal to the generic type
D. - Some adapters, mainly
DAWrapper, need to do an additional transformation after deserializing. This transformation is possible by implementing a newtransform_deserializedtrait method. For example,DAWrapperwraps the deserialized value into aNoKeyWrapper.
- Most of the time, the type is equal to the generic type
- Create a new
DefaultDecodersub-trait, which describes adapters that can be used without supplying a decoder (since they provide a default one).- Gate the
from_bytesandfrom_vec_bytesmethods on theDefaultDecodertrait and provide a default implementation using the*_withvariants and the default decoder. - The
DefaultDecodertrait is implemented:- for
CDRDeserializerAdapterifDimplementsserde::Deserialize. - for
PlCdrDeserializerAdapterifDimplementsPlCdrDeserialize.
- for
- Gate the
- Add a
- Use the
DefaultDecodertrait to support both stateful and stateless deserialization in most functions, without duplicating the code - Create new
try_take_one_withandas_async_stream_withAPI methods to allowros2-clientand others to plug in custom stateful deserializers.
I tried my best to limit the scope of this PR as much as possible and to document all the changes. Unfortunately, I failed to make this a non-breaking change, mainly because of the new DeserializerAdapter design. I hope you still consider this PR, as this feature is critical for our use case in the dora project.
Please let me know if you have any questions or ideas for improvements!
(edit: The commit list got a bit messy, sorry. I can squash if you like.)
With these changes, we can update ros2-client to provide take_seed and async_stream_seed functions on Subscription. I opened a draft PR for this at https://github.com/jhelovuo/ros2-client/pull/20 to show the required ros2-client changes.
I'm not sure why the "Code Coverage" CI job fails, but it appears to be broken on the master branch as well: https://github.com/jhelovuo/RustDDS/actions/workflows/coverage.yml . So I don't think that the failure is related to this PR.
Hello,
Thank you for the interesting idea and PR. I was not aware of the DeserializeSeed trait before. I can see the use case here.
If the deserialization would be changed as you propose, how would the serialization go then?
And no need to worry about the code coverage CI job.
Serialization doesn't need to be changed, since the serde::Serialize trait gives access to the runtime value that should be serialized. So we can use the type information parsed from ROS2 message files in a custom Serialize implementation.
For example:
pub struct TypedPythonValue<'a> {
pub value: &'a pyo3::Any, // given from Python
pub type_info: &'a Ros2TypeInfo, // parsed from ROS2 message files
}
impl serde::Serialize for TypedValue<'_> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
match &self.type_info.data_type {
DataType::UInt8 => {
// the ROS2 topic expects an `u8` value, so try to convert the Python value
let value: u8 = self.value.extract().map_err(serde::ser::Error::custom)?;
serializer.serialize_u8(value)
}
.... // handle other ROS2 data types
_ => todo!(),
}
}
}
This approach doesn't work with the standard serde::Deserialize trait because it is completely stateless, i.e. there is no self argument. So for deserialization based on runtime type information, we need to use DeserializeSeed.
Codecov Report
Attention: Patch coverage is 52.17391% with 66 lines in your changes are missing coverage. Please review.
Project coverage is 73.26%. Comparing base (
28ed26f) to head (4e8e37d). Report is 130 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## master #313 +/- ##
==========================================
- Coverage 73.53% 73.26% -0.28%
==========================================
Files 94 95 +1
Lines 19845 20192 +347
==========================================
+ Hits 14594 14793 +199
- Misses 5251 5399 +148
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for keeping your PR up to date. I assume this means you are using it for yourself.
We are considering accepting this PR, even though it adds extra complication. We may have another use case where it will be useful. This is just to let you know that the PR is not dead.
Thanks for the update! Yes, we're using this PR for the https://github.com/dora-rs/dora project. We need it because we precompile Python bindings on a different machine, which need to be able to deserialize ROS2 messages based on type information parsed at runtime.
Looking at the central part of the new design, namely, new code to adapters.rs , terms "decode" and "transform" are used. Are these interchangeable? If yes, perhaps only one term should be used for clarity?
Looking at the central part of the new design, namely, new code to
adapters.rs, terms "decode" and "transform" are used. Are these interchangeable? If yes, perhaps only one term should be used for clarity?
This question makes no sense. Apparently, "decode" and "deserialize" are more like synonyms. So the pipeline is
byte slice → deserialize/decode → value of type Self::Deserialized → call
[Self::transform_deserialized] → value of type D
And it makes sense to use "decode", because the term "deserialize" is already taken.
Am I now on the correct track?
Another question: What is the use case for transform_deserialized() ?
In this PR and the corresponding ros2-client PR it always seems to be the identity function.
I just created a new branch deserialize-with-seed for developing this feature. Please redirect the PR there.
Is there a specific reason in
fn from_bytes_with<S>(
input_bytes: &[u8],
encoding: RepresentationIdentifier,
decoder: S,
) -> Result<D, S::Error>
where
S: Decode<Self::Decoded>,
{
...
}
for the decoder to be passed by value rather than reference? Correspondingly, Decode::decode_bytes() takes self rather than &self. (Or maybe &mut self?)
If using a reference is possible, then we could get rid of the Clone requirement in DefaultDecoder::Decoder. Requiring Clone is a bit awkward, because it would really do the cloning at every sample processed.
So the pipeline is
byte slice → deserialize/decode → value of type
Self::Deserialized→ call [Self::transform_deserialized] → value of typeD
Yes, exactly!
Another question: What is the use case for
transform_deserialized()?
There are some adapters that add a wrapper around the deserialized values. With the normal Deserialize trait, this could happen behind the scenes because the adapters could just call deserialize on the desired type. With DeserializeSeed, the adapters need to explicitly specify the type that the DeserializeSeed instance should create, which might be different than the output type D.
In this PR and the corresponding
ros2-clientPR it always seems to be the identity function.
Yes, it's almost always the identity function. The one exception is no_key::DAWrapper, which wraps a DeserializerAdapter<D>, but returns a DeserializerAdapter<NoKeyWrapper<D>>. So it's transform_deserialized call wraps the result in a new NoKeyWrapper instance:
NoKeyWrapper::<D> {
d: DA::transform_deserialized(deserialized),
}
(This implementation is at src/dds/no_key/wrappers.rs#L94-L96.)
I just created a new branch deserialize-with-seed for developing this feature. Please redirect the PR there.
I just tried, but it looks like all the commits of this PR are already part of the branch.
Is there a specific reason in
fn from_bytes_with<S>( input_bytes: &[u8], encoding: RepresentationIdentifier, decoder: S, ) -> Result<D, S::Error> where S: Decode<Self::Decoded>, { ... }for the
decoderto be passed by value rather than reference? Correspondingly,Decode::decode_bytes()takesselfrather than&self. (Or maybe&mut self?)If using a reference is possible, then we could get rid of the
Clonerequirement inDefaultDecoder::Decoder. RequiringCloneis a bit awkward, because it would really do the cloning at every sample processed.
I just added that code last week to get things working again with the latest changes from master. I'm not satisfied with that part of the implementation either, so it would be great if we could find a better solution.
The reason for https://github.com/jhelovuo/RustDDS/pull/313/commits/4e8e37d6405a21f6d54c1d1255e7ae1d65f41720 was the loop introduced in https://github.com/jhelovuo/RustDDS/commit/6908f40f68ddebbcc3f059b02cb5f7e64cac4986, which calls deserialize_with multiple times.
I agree that taking a (mutable?) reference to the decoder would be the cleaner solution and would be fine for our use case. Unfortunately, the DeserializeSeed::deserialize method takes self by value, so I'm not sure how we could avoid a Clone bound...
(Side note: If your DeserializeSeed type T only needs &self, you could implement DeserializeSeed for &T as well. Then the .clone() call would be just a pointer copy.)
So the pipeline is byte slice → deserialize/decode → value of type
Self::Deserialized→ call [Self::transform_deserialized] → value of typeDYes, exactly!
Ok. Your technical implementation is brilliant. This is quite complicated abstract metaprogramming.
In order to make it more accessible to others, I decided to rename some things. The basic renaming idea is
deserialize = decode + transform.
In your original code, there was confusion between deserialize and decode, as these were used as synonyms. Now it should be more clear that the whole two-step process is "deserialize" and the first step is called "decode".
I am migrating this PR to PR #334 , so we can continue the work there.
Is there a specific reason in
...code...
for the
decoderto be passed by value rather than reference? Correspondingly,Decode::decode_bytes()takesselfrather than&self. (Or maybe&mut self?) If using a reference is possible, then we could get rid of theClonerequirement inDefaultDecoder::Decoder. RequiringCloneis a bit awkward, because it would really do the cloning at every sample processed.I just added that code last week to get things working again with the latest changes from
master. I'm not satisfied with that part of the implementation either, so it would be great if we could find a better solution.The reason for 4e8e37d was the
loopintroduced in 6908f40, which callsdeserialize_withmultiple times.
The new loop was logically necessary, because it can happen that in a with_key DDS topic, e.g. most of the Discovery topics, we could receive a Dispose instead of data update in a DATA submessage. The Dispose is indicated by sending a Key instead of Data, or alternatively, sending an MD5 hash of the Key.
In case we, as a DataReader, join an already long-running Topic, it may have some past data values published that we can no longer receive, because the History buffers are not deep enough. But we can at some point receive Dispose notifications of these samples we have never heard of. In case the Dispose was via a hash only, there is no way we can tell the application about the Dispose, because we do not know the actual key, so it is best to just ignore it, and move on to the next incoming sample. This is why the looping is needed.
This is something we could perhaps work around to avoid the need for cloning.
I agree that taking a (mutable?) reference to the decoder would be the cleaner solution and would be fine for our use case. Unfortunately, the
DeserializeSeed::deserializemethod takesselfby value, so I'm not sure how we could avoid aClonebound...(Side note: If your
DeserializeSeedtypeTonly needs&self, you could implementDeserializeSeedfor&Tas well. Then the.clone()call would be just a pointer copy.)
Ok, the pass-by-value of self in DeserializeSeed::deserialize is something we cannot work around.
Then we must keep the Clone bound, and document that there will be quite a lot of cloning, about one .clone() per deserialized sample, so the application designer knows to invest in making it cheap, like a pointer copy.
Thanks a lot!