msgpack-rust icon indicating copy to clipboard operation
msgpack-rust copied to clipboard

Why serialize newtype struct as array?

Open SX91 opened this issue 8 years ago • 15 comments

https://github.com/3Hren/msgpack-rust/blob/master/rmp-serde/src/encode.rs#L375

Newtype is simply a wrapper around inner data, so it should encode/decode as inner data. You shouldn't really treat it as one-field struct IMO. The same goes for newtype_variant: maybe it should be encoded as [index, value]? It would be the most compact representation.

SX91 avatar May 16 '17 06:05 SX91

Since there is no standard description how these types should be serialized using various encoders each crate does it on its own.

The best thing I can offer is a some kind of configurable Serializer.

3Hren avatar May 16 '17 13:05 3Hren

@3Hren newtype struct is a special (type-system only) case, almost like in Haskell with it's newtype keyword. It's a type-system level wrapper around some other type. So it should be treated as a special case. serialize_newtype_struct has been added to Serde exactly for this purpose.

Consider some example:

struct MyVecWrapper(Vec<u32>);
enum MyEnum {
    Variant1(MyVecWrapper),
    Variant2(MyVecWrapper, MyVecWrapper),
}

MyEnum::Variant1(MyVecWrapper(vec![...])) would be serialized as [0, [[[...]]]], wouldn't it? That's quite a lot of overhead, while it could be simple [0, [...]] like with Haskell's data-msgpack package.

SX91 avatar May 16 '17 13:05 SX91

I got the point, thanks. Still, I need to check whether some of my crates has already tied with pack-as-array behaviour.

3Hren avatar May 16 '17 17:05 3Hren

@3Hren could you please explain https://github.com/3Hren/msgpack-rust/blob/master/rmpv/src/ext/de.rs#L717 ? newtype_variant is enum A {A(i32)}, right? So there's no way there could be T..., isn't it?

SX91 avatar Aug 09 '17 10:08 SX91

Probably a comment bug. Need to remember how it dammit works actually :).

3Hren avatar Aug 09 '17 10:08 3Hren

What do you think should we do with unit structs/variants? Do nothing? Nil? Or leave serialization as an empty array?

May be make it configurable?

3Hren avatar Sep 18 '17 20:09 3Hren

@3Hren Haskell's data-msgpack uses Nil representation for units. For unit variant we should use simple [code] encoding. So:

// Nil
struct Unit
// [code]
enum Enum {
    A,
    B
}

I don't think configurations are required here.

SX91 avatar Sep 20 '17 13:09 SX91

Feels inconsistent. Why not encode unit variant with nil then:

// [code, nil]
enum Enum {
    A,
    B
}

3Hren avatar Sep 20 '17 13:09 3Hren

But why? Unit struct would be encoded as Nil because it's the most compact representation while unit variants would be just [code] (the most compact would be code, but that one actually is inconsistent). Consider this example:

enum Enum {
    A,
    B(i32),
    C(i32, i32),
}
// Enum::A - [0]
// Enum::B(0) - [1, 0]
// Enum::C(0, 1) - [2, [0, 1]]

This format is compact, unambiguous and separates unit variant, newtype variant and tuple (struct) variant. Structs would be consistent and unambiguous too:

struct A; // Nil
struct B(i32); // Value
struct C(i32, i32); // [value, value]

SX91 avatar Sep 20 '17 13:09 SX91

Consider:

enum Enum {
    A, // [0]
    B(i32), // [1, i32]
    C((i32,)), // [2, [i32]]
    D(i32, i32), // [3, [i32, i32]]
    E((i32, i32)), // [4, [i32, i32]]
}

I feel this subjectively inconsistent. One can say that it's not. Why not to serialize D(i32, i32) as [3, i32, i32] without inner array and so on.

3Hren avatar Sep 20 '17 14:09 3Hren

I don't see any inconsistensy here.

Why not to serialize D(i32, i32) as [3, i32, i32] without inner array and so on.

Newtype variant is a simple value wrapper, while struct/tuple variant is actually a structure. Think of a parser that should

  1. choose constructor (variant)
  2. parse value

For units it would end up with only choosing a constructor. For others (newtypes and structs/tuples) it would parse value too, which is a value for newtype variant, map/array for struct/tuple. Serde separates units, newtypes, tuples and structs, so we should just use that.

SX91 avatar Sep 20 '17 14:09 SX91

I just found out that a newtype struct is serialized as fixarray (0x91). I concur with @SX91 that newtypes are a special case and that they should be transparent. To quote from Serde docs:

Data formats are encouraged to treat newtype structs as insignificant wrappers around the inner value, serializing just the inner value. A reasonable implementation would be to forward to value.serialize(self)

In the meantime, as a workaround, manually implementing Serialize and Deserialize for the newtype is probably the way to go, right?

Edit: Example for serializing a newtype containing an u8 array:

use serde::ser::{Serialize, Serializer, SerializeSeq};

impl Serialize for Cookie {
    fn serialize<S>(&self, serializer: S) -> ::std::result::Result<S::Ok, S::Error>
            where S: Serializer {
        let mut seq = serializer.serialize_seq(Some(self.0.len()))?;
        for byte in self.0.iter() {
            seq.serialize_element(byte)?;
        }
        seq.end()
    }
}

Deserializing is a bit more complicated.

dbrgn avatar Oct 30 '17 08:10 dbrgn

@dbrgn I'd recommend using master branch for now as those changes are already there.

SX91 avatar Oct 31 '17 17:10 SX91

Yep, these are on master. I can't currently publish a new release, because I'm in the middle of developing a tool that tries to compile dependent crates and run their tests to figure out what will be broken after publishing (I know about semver, but sudden breaking isn't good). If there are - I feel that I need some kind of migration guide to be done before publishing.

3Hren avatar Oct 31 '17 17:10 3Hren

I'd recommend using master branch for now as those changes are already there.

Ah, awesome!

dbrgn avatar Nov 01 '17 09:11 dbrgn