risc0
risc0 copied to clipboard
[BUG] risc0_zkvm serde fails to deserialize if the serialized type differs
Bug Report
The title is pretty generic but I don't know how to express that cleaner. It's pretty common to use some tricks when implementing Serialize
and Deserialize
by hand. One such trick is to serialize String
and deserialize &str
, to avoid double allocations.
This example will fail:
use serde::de::Error as _;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
#[derive(Debug)]
struct Abc(Vec<u8>);
impl Serialize for Abc {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let hex = hex::encode(&self.0);
serializer.serialize_str(&hex)
}
}
impl<'de> Deserialize<'de> for Abc {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
// use <&str> to avoid intermediate allocation
let hex = <&str>::deserialize(deserializer)?;
hex::decode(hex).map(Self).map_err(D::Error::custom)
}
}
fn main() {
let abc = Abc(vec![1, 2, 3, 4]);
let serialized = risc0_zkvm::serde::to_vec(&abc).unwrap();
let deserialized: Abc = risc0_zkvm::serde::from_slice(&serialized).unwrap();
println!("{deserialized:?}");
}
Another pattern I'm aware of (eg. tendermint-rs does that) is to serialize T
but deserialize Option<T>
and have some fallback in case it's not present.
Expected behavior
It would be cool if risc0_zkvm would support such tricks as they're not so uncommon in the ecosystem. Ciborium is an example of binary serializer that supports both of the above, and bincode seems to support the &str
trick.
Your Environment
- risc0-zkvm version: 0.21.0
- Rust version: cargo 1.76.0 (c84b36747 2024-01-18)
- Platform/OS: archlinux 6.8.7-zen1-1-zen
Hey @zvolin! I took a quick look at this and it's looks like the root cause is that risc0's from_slice
method delegates to an implementation over a Read
er, so borrowing data is disallowed. You'll get the same error (verbatim) if you try to deserialize an &str
using serde_json::from_reader
.
It seems like fixing this should be relatively straightforward, but I couldn't get the zkvm
crate to build locally on my first try, so I'll wait for the actual team to come along :)
As for the Option
optimization you mention, I think that's unsound and the fact that it happens to work in some implementations is just coincidence. In the serde data model, Some(T)
is not equivalent to T
- you use serialize_some
for the first case, but not in the second. It would be sound to use #[serde(from = "Option<T>", into = "Option<T>")]
, but it's unsound to use just one.
This trick seems to be used in Tendermint because they only care about supporting json
, which serializes Option<T>
and T
identically .
Note: to build on main you'll need to install cargo-risczero
from main as well. This situation should be resolved once the next release lands.
cargo install --path risc0/cargo-risczero
cargo risczero install
As for the Option optimization you mention, I think that's unsound and the fact that it happens to work in some implementations is just coincidence. In the serde data model, Some(T) is not equivalent to T - you use serialize_some for the first case, but not in the second. It would be sound to use #[serde(from = "Option<T>", into = "Option<T>")], but it's unsound to use just one.
I've read the serde data model and I'm not sure I agree.
the Serialize implementation for the data structure is responsible for mapping the data structure into the Serde data model by invoking exactly one of the Serializer methods
the Deserialize implementation for the data structure is responsible for mapping the data structure into the Serde data model by passing to the Deserializer a Visitor implementation that can receive the various types of the data model
But there is no fundamental reason that these mappings need to be straightforward. The Serialize and Deserialize traits can perform any mapping between Rust type and Serde data model that is appropriate for the use case.
This doesn't answer the question directly, but more or less I understand it that Visitor can accept different types, and mapping don't need to be straighforward. There is even example of serializing path as one enum variant and deserializing into other.
I always thought of <Option<T>>::deserialize
being a trick to avoid using ugly Visitor
api, but my understanding is that if I had a visitor with visit_i32
and visit_none
that would be equivalent of <Option<i32>>::deserialize
and it would be sound. Not sure too if sound and unsound aren't too big words for this