msgpack-rust icon indicating copy to clipboard operation
msgpack-rust copied to clipboard

rmp-serde: Failed to deserialize &[u8]

Open sudeep9 opened this issue 7 years ago • 5 comments

I have a basic question. I have the following code (rmp-serde version = 0.13.7):

#[derive(Debug, Deserialize, Serialize)]
struct Data<'a> {
    buf: &'a [u8]
}

fn codec() -> Result<(), Error> {
    let buf = b"hello";

    let d = Data{buf: buf.as_ref()};

    let mut outbuf = Vec::new();
    d.serialize(&mut Serializer::new_named(&mut outbuf))?;

    let inbuf = outbuf.as_slice();
    let mut de = Deserializer::from_slice(inbuf);
    let d2: Data = serde::Deserialize::deserialize(&mut de)?;
    println!("decoded data = {:?}", d2);

    Ok(())
}

The deserialization fails with error: Error: Syntax("invalid type: sequence, expected a borrowed byte array"). The code works if buf: &'a [u8] is changed to buf: &'a str.

Does &'a [u8] needs a different treatment?

sudeep9 avatar Feb 28 '18 12:02 sudeep9

Hello @sudeep9 I resolved this issue just using the serde_bytes

mgxm avatar Mar 14 '18 23:03 mgxm

It should be default behaviour to serialise Vec in way that serde_bytes does. Without it bytes above 0x7F like 0xFDFEFF are serialised into messagepack fixarray (0x9X) [81, A1, 61, 93, CC, FD, CC, FE, CC, FF] instead of [81, A1, 61, C4, 3, FD, FE, FF] where is used bin 8 (0xC4), this means about 1.5x more size in serialised format than original binary data.

Using serde_bytes is more efficient but unfortunately seems still does not support deserialisation into &'a [u8].

misos1 avatar Jul 24 '18 19:07 misos1

Not sure if this has been resolved, but figured better not to lead to the "Wisdom of the Ancients" scenario...

To work around not being able to deserialize into &'a [u8], I also used serde_bytes, but instead have two nearly-identical structs with the exception that:

  • one that only derives Serialize that uses &'a [u8] (borrowed)
  • one that only derives Deserialize that uses Vec<u8> (owned) (technically, this could also be Serialize but this prevents any unnecessary data copying)

From a Rust borrow checker perspective, it makes sense to some degree since when you're serializing, you only need to have a borrowed version (the message struct doesn't need to own the buffer). While, when you're deserializing, the message you get back with the buffer should own the buffer via a Vec<u8> otherwise it's not clear who owns the buffer.

Example snippet:

#[derive(Serialize)]
pub struct FooRef<'a> {
    #[serde(with = "serde_bytes")]
    pub buf: &'a [u8],
}

#[derive(Deserialize)]
pub struct FooOwned {
    #[serde(with = "serde_bytes")]
    pub buf: Vec<u8>,
}

jaxrtech avatar Dec 01 '20 04:12 jaxrtech

While, when you're deserializing, the message you get back with the buffer should own the buffer via a Vec otherwise it's not clear who owns the buffer.

It makes sense in some use cases but not in all. Why would not be clear who owns the buffer?

misos1 avatar Dec 17 '20 18:12 misos1

By default it's not possible to deserialize anything into &[u8], because slices can't store any data.

You can make the struct borrow from the input it parses, and in Serde you need to add #[serde(borrow)] annotation to tell Serde to do it.

Also try Cow<'a, [u8]> if you want to use either borrowed or owned data.

kornelski avatar Dec 19 '20 23:12 kornelski