serde_with icon indicating copy to clipboard operation
serde_with copied to clipboard

feat(serde_with): make `BytesOrString` adjustable

Open sivizius opened this issue 1 year ago • 6 comments

At the moment, any Vec<u8> annotated with #[serde(as = "BytesOrString")] will always be serialised as an array of integers in the range 0–255. However, often these bytes are valid strings, the internal representation as Vec<u8> is often just an implementation detail. This PR adds a generic type parameter PREFERENCE, which must implement the new trait TypePreference, which is a subtrait of SerializeAs<[u8]>. This parameter defaults to the new marker-strut PreferBytes, which serialises Vec<u8> as an array like before. Two additional marker-structs are PreferString, which tries to convert &[u8] to &str first and will fallback to serialising as array only if the bytes do not represent a valid string, as well as PreferAsciiString, which additionally checks that all characters of the string are valid ASCII-characters, otherwise falling back to the array as well.

sivizius avatar Oct 10 '24 12:10 sivizius

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 67.59%. Comparing base (2d70b70) to head (e030aa0). Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #793      +/-   ##
==========================================
+ Coverage   67.34%   67.59%   +0.25%     
==========================================
  Files          40       40              
  Lines        2468     2509      +41     
==========================================
+ Hits         1662     1696      +34     
- Misses        806      813       +7     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 10 '24 12:10 codecov[bot]

Given this PR is #794 really necessary? The Bytes type does not really have any string association, but is primarily to enable calling the (potentially) optimized serialize_bytes function.

jonasbb avatar Oct 10 '24 20:10 jonasbb

Given this PR is #794 really necessary? The Bytes type does not really have any string association, but is primarily to enable calling the (potentially) optimized serialize_bytes function.

It was more like an obvious addition to this PR. However, making this trait work for Bytes as well somewhat changed the serialisation behaviour of BytesOrString for the PreferBytes case: It will call serialize_bytes. Is this a breaking change?

sivizius avatar Oct 10 '24 20:10 sivizius

It will call serialize_bytes. Is this a breaking change?

Oh yeah, that is changing the behavior. In JSON it is not visible, as there is no efficient way to store binary data. There are some tests for Bytes using assert_tokens.

jonasbb avatar Oct 10 '24 23:10 jonasbb

It will call serialize_bytes. Is this a breaking change?

Oh yeah, that is changing the behavior. In JSON it is not visible, as there is no efficient way to store binary data. There are some tests for Bytes using assert_tokens.

It’s not breaking for Bytes, I adjusted the SerdeAs<[u8]> implementation of PreferBytes to match the current behaviour. It has however changed it for BytesOrString though, which used source.serialize(serializer) before and uses serializer.serialize_bytes(source) now.

sivizius avatar Oct 11 '24 11:10 sivizius

The suggested changes (serialize_bytes → serialize) would break #794. However, as you do not seem to be convinced of that PR anyway and neither is that PR very important to me – it just seemed reasonable to me to apply this to Bytes as well, but I guess, closing it and just proceeding with this PR here is the way to go?

sivizius avatar Oct 22 '24 10:10 sivizius