risc0 icon indicating copy to clipboard operation
risc0 copied to clipboard

[BUG] risc0_zkvm serde fails to deserialize if the serialized type differs

Open zvolin opened this issue 9 months ago • 5 comments

Bug Report

The title is pretty generic but I don't know how to express that cleaner. It's pretty common to use some tricks when implementing Serialize and Deserialize by hand. One such trick is to serialize String and deserialize &str, to avoid double allocations.

This example will fail:

use serde::de::Error as _;
use serde::{Deserialize, Deserializer, Serialize, Serializer};

#[derive(Debug)]
struct Abc(Vec<u8>);

impl Serialize for Abc {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        let hex = hex::encode(&self.0);
        serializer.serialize_str(&hex)
    }
}

impl<'de> Deserialize<'de> for Abc {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        // use <&str> to avoid intermediate allocation
        let hex = <&str>::deserialize(deserializer)?;
        hex::decode(hex).map(Self).map_err(D::Error::custom)
    }
}

fn main() {
    let abc = Abc(vec![1, 2, 3, 4]);
    let serialized = risc0_zkvm::serde::to_vec(&abc).unwrap();
    let deserialized: Abc = risc0_zkvm::serde::from_slice(&serialized).unwrap();

    println!("{deserialized:?}");
}

Another pattern I'm aware of (eg. tendermint-rs does that) is to serialize T but deserialize Option<T> and have some fallback in case it's not present.

Expected behavior

It would be cool if risc0_zkvm would support such tricks as they're not so uncommon in the ecosystem. Ciborium is an example of binary serializer that supports both of the above, and bincode seems to support the &str trick.

Your Environment

  • risc0-zkvm version: 0.21.0
  • Rust version: cargo 1.76.0 (c84b36747 2024-01-18)
  • Platform/OS: archlinux 6.8.7-zen1-1-zen

zvolin avatar Apr 29 '24 16:04 zvolin

Hey @zvolin! I took a quick look at this and it's looks like the root cause is that risc0's from_slice method delegates to an implementation over a Reader, so borrowing data is disallowed. You'll get the same error (verbatim) if you try to deserialize an &str using serde_json::from_reader.

It seems like fixing this should be relatively straightforward, but I couldn't get the zkvm crate to build locally on my first try, so I'll wait for the actual team to come along :)

preston-evans98 avatar Apr 30 '24 01:04 preston-evans98

As for the Option optimization you mention, I think that's unsound and the fact that it happens to work in some implementations is just coincidence. In the serde data model, Some(T) is not equivalent to T - you use serialize_some for the first case, but not in the second. It would be sound to use #[serde(from = "Option<T>", into = "Option<T>")], but it's unsound to use just one.

This trick seems to be used in Tendermint because they only care about supporting json, which serializes Option<T> and T identically .

preston-evans98 avatar Apr 30 '24 01:04 preston-evans98

Note: to build on main you'll need to install cargo-risczero from main as well. This situation should be resolved once the next release lands.

cargo install --path risc0/cargo-risczero
cargo risczero install

flaub avatar Apr 30 '24 06:04 flaub

As for the Option optimization you mention, I think that's unsound and the fact that it happens to work in some implementations is just coincidence. In the serde data model, Some(T) is not equivalent to T - you use serialize_some for the first case, but not in the second. It would be sound to use #[serde(from = "Option<T>", into = "Option<T>")], but it's unsound to use just one.

I've read the serde data model and I'm not sure I agree.

the Serialize implementation for the data structure is responsible for mapping the data structure into the Serde data model by invoking exactly one of the Serializer methods

the Deserialize implementation for the data structure is responsible for mapping the data structure into the Serde data model by passing to the Deserializer a Visitor implementation that can receive the various types of the data model

But there is no fundamental reason that these mappings need to be straightforward. The Serialize and Deserialize traits can perform any mapping between Rust type and Serde data model that is appropriate for the use case.

This doesn't answer the question directly, but more or less I understand it that Visitor can accept different types, and mapping don't need to be straighforward. There is even example of serializing path as one enum variant and deserializing into other.

I always thought of <Option<T>>::deserialize being a trick to avoid using ugly Visitor api, but my understanding is that if I had a visitor with visit_i32 and visit_none that would be equivalent of <Option<i32>>::deserialize and it would be sound. Not sure too if sound and unsound aren't too big words for this

zvolin avatar Apr 30 '24 07:04 zvolin