algebra icon indicating copy to clipboard operation
algebra copied to clipboard

What is the difference between ark-serliaze and serde ?

Open sunhuachuang opened this issue 4 years ago • 20 comments

Could serde replace ark-serialize ? Forgive my ignorance, there may be something special in it. And serialization is very transitive, once it is used, it needs to be used everywhere. And in the rust ecology, serde has become the standard, many libraries only need to use derive(Serailize, Deserialize) to complete the automatic serialization work, and there are bincode, postcard and other ecological libraries to support.

sunhuachuang avatar Jan 18 '21 03:01 sunhuachuang

I have the same question :D

mimoo avatar Oct 01 '21 23:10 mimoo

Pratyush added this as a documentation improvement goal. I guess there is a reason to separate it from the existing serde.

A probable reason is that we want different types of serialization: checked + compressed, checked + uncompressed, unchecked + compressed

Serde may not support this flexibility. In practice, you do have situations where checking is unnecessary and is too expensive, and some other situations where the inputs must be checked.

weikengchen avatar Oct 02 '21 00:10 weikengchen

In the mean time, you can work around that by using #[serde(with)] or serde_with for serialization when containers are involved:

use serde_with::{DeserializeAs, SerializeAs};


pub struct SerdeAs;

impl<T> serde_with::SerializeAs<T> for SerdeAs
where
    T: CanonicalSerialize,
{
    fn serialize_as<S>(val: &T, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        let mut bytes = vec![];
        val.serialize(&mut bytes)
            .map_err(serde::ser::Error::custom)?;

        serde_with::Bytes::serialize_as(&bytes, serializer)
    }
}

impl<'de, T> serde_with::DeserializeAs<'de, T> for SerdeAs
where
    T: CanonicalDeserialize,
{
    fn deserialize_as<D>(deserializer: D) -> Result<T, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        let bytes: Vec<u8> = serde_with::Bytes::deserialize_as(deserializer)?;
        T::deserialize(&mut &bytes[..]).map_err(serde::de::Error::custom)
    }
}

And you can use it like that:

#[serde_as]
#[derive(Clone, Serialize, Deserialize)]
pub struct ConstraintSystem<F: FftField> {
    #[serde_as(as = "o1_utils::evaluations::SerdeAs")]
    pub addl: E<F, D<F>>,
    #[serde_as(as = "[o1_utils::evaluations::SerdeAs; PERMUTS]")]
    pub sigmal8: [E<F, D<F>>; PERMUTS],
}

mimoo avatar Oct 02 '21 01:10 mimoo

I think serde makes little sense for lower level cryptography crates, given serde's complexity. Anyone doing higher level protocols could implement serde for their own composite structures using the lower level crate's bytes serialization.

burdges avatar Oct 03 '21 10:10 burdges

I'm not sure I would agree with that. In our case, without the work around I just described we'd have to implement large custom serialization/deserialization functions (which would kind of void the benefits of using generated code) or use wrappers around all arkworks types.

mimoo avatar Oct 03 '21 17:10 mimoo

A priori, cryptographic protocols should specify bitwise layout for serialization formats, well above the layer where arkworks operates, and not permit serde like reinterpretation. Also, multi-target serialization formats like serde have a poor record for preserving signature or MAC validity.

burdges avatar Oct 03 '21 23:10 burdges

Do you have an attack or issue in mind if serde is used at this layer?

mimoo avatar Oct 04 '21 16:10 mimoo

It's just fragile and likely to break. There is a whole messy standard for canonicalizing and signing XML documents specifically because it doesn't make much sense to sign XML documents internally to the XML document. If you've a rust struct that contains a signature or MAC on some members then serde would not produce the proper XML. Afaik, anything else human readable like JSON exhibits similar problems, but if they've no standard then at least you're not deviating from the standard.

Arkworks goes way beyond signatures and MACs so protocols here contain more extensive interaction between cryptography and data, making all this much worse. It'll surely avoid headaches if you manually specify appropriate binary formats for serialization and deserialization.

At a deeper level and as noted above, there exist slow serialization and deserialization operations used in pairing based protocol, especially G_T but even G_2, so some protocols need optimizations for exactly the correct operations, sometimes including human checked mathematical arguments for why some check is unecessary.

burdges avatar Oct 05 '21 07:10 burdges

I can see how it can be dangerous in a network protocol or to canonically serialize bytes before hashing/MAC'ing/signing them, but serialization is also useful for storing to disk or other less threatening use-cases. The dalek libraries, for example, provides serde serialization behind a feature flag. Also, you can use serde to write canonical serialization, see what we did at Diem: https://github.com/novifinancial/serde-reflection/#benefits

mimoo avatar Oct 05 '21 17:10 mimoo

@burdges I'm not sure I get which characteristic of serde, specifically, leads to the fragility you have in mind : serde is just a framework for providing (fully customizable) serializers and composing them through macro-derivation and a trait hierarchy.

In a word, what's wrong about the discipline of descending through fields of an object, if you are in total control of what happens at each field? The reason I'm asking is essentially that by doing this:

cryptographic protocols should specify bitwise layout for serialization formats, well above the layer where arkworks operates, and not permit serde like reinterpretation.

You're (IMHO) making it harder for others to reuse and compose any bitwise serialization you might define, at whichever level you find suitable. It's well and good to define compressed and uncompressed point serialization and be picky about which is used where. But because of, essentially, the orphan rule in Rust's trait system, it's imposing NewTypes on the user (as @mimoo demonstrated) to happen to not pick one as a way of saving a point to disk (see also C-COMMON-TRAITS).


In more detail:

Serde happens to provide a serialization default for Rust primitive types, and opt-in binary encoding libraries that target a number of (mostly self-describing) destination formats (JSON, YAML, etc ...). But that's incidental: the workhorse of serde is the trait hierarchy, and you do not have to adopt any of the default options [^1].

Moreover, because serde relies on traits and has annotation-based trapdoors, you can control the code paths that will be used for portions of an object as you will, from vanilla custom implementations of serde::(Des|S)erialize for XX to enforcing mandatory serde code paths across a trait hierarchy [^2], to breaking out of the default trait implementations of a struct when it's present (embedded) in a specific context (object).

Finally, if you want hashing to ignore certain contents, because some of those contents may themselves includes hashes, I might add that I've found it much easier to write custom hashing implementations (and recursively, a hashing hierarchy), rather than writing a custom serialization from scratch just to be able to have your hasher eat those bytes (I can detail). But maybe I do not understand what you have in mind fully.

[^1]: For example, Diem/Libra happens to use serde-reflection to describe what the semantic model of its data types should look like (through an IDL), and bcs to describe what the production of output bytes should look like, based on that model [^3]. Both are compatible with the trait hierarchy of serde, simplifying the UX, while retaining total control over the serialization machinery.

[^2]: For example, Diem happens to use proc-macros to ensure any deserialization of key material must run through the implementation of TryFrom implementation on the key, where we run additional checks (e.g. subgroup checks, where applicable).

[^3]: Note that the entire point of bcs is to ensure deterministic serialization / deserialization, which may be of ancillary benefit to this topic, given ark-serialize's workhorse trait CanonicalSerialize

huitseeker avatar Oct 05 '21 17:10 huitseeker

I think I will need read into this a little more.

For me, the flexibility to deserialize G_1/G_2 encodings with or without subgroup checks, and with or without compressions would be crucial, as they have shown to affect the performance of certain applications severely.

weikengchen avatar Oct 05 '21 17:10 weikengchen

@weikengchen There's a way to keep that flexibility around, as follows: make Uncompressed/Compressed wrapper types that hardcode the respective type of serialization/deserialization, and impl serde::Serialize/Deserialize on these:

pub struct UncompressedUnchecked<T>(T);
pub struct UncompressedChecked<T>(T);
pub struct CompressedUnchecked<T>(T);
pub struct CompressedChecked<T>(T);

impl<T: CanonicalSerialize> CanonicalSerialize for UncompressedUnchecked<T> {
	// always invoke `T::serialize_uncompressed()`
}
impl<T: CanonicalDeserialize> CanonicalDeserialize for UncompressedUnchecked<T> {
	// always invoke `T::deserialize_uncompressed_unchecked()`
}
impl<T: CanonicalSerialize> serde::Serialize for UncompressedUnchecked<T> {
	// always invoke `CanonicalSerialize::serialize_uncompressed()`
}
impl<T: CanonicalDeserialize> serde::DeserializeOwned for UncompressedUnchecked<T> {
	// always invoke `CanonicalDeserialize::deserialize_uncompressed_unchecked()`
}

The expected usage from an end-user's perspective is as follows

#derive(Serialize, Deserialize)]
pub struct Transaction {
	// other `serde` compatible stuff
	pub tx_proof: CompressedChecked<groth16::Proof<Bls12_381>>,
}

Pratyush avatar Oct 05 '21 19:10 Pratyush

Then the question becomes this: what if I want to be able to serialize a transaction in two different ways, one compressed+checked, one uncompressed+unchecked?

Basically, the passing of the "wrapper". Note that this could be nontrivial. It is possible that for certain applications, we just intentionally don't check at a specific level, for a specific entry. When we are compressing, we may also get rid of all the prepared field elements, while we keep them when we are not compressing.

weikengchen avatar Oct 05 '21 19:10 weikengchen

Ah, I think I am thinking too much---in that case, we just don't use "derive".

weikengchen avatar Oct 05 '21 19:10 weikengchen

I think probably those kinds of decisions require custom implementations of CanonicalSerialize/Deserialize and corresponding custom impls of the serde traits too. I think it's not possible to predict and account for everybody's custom requirements

Pratyush avatar Oct 05 '21 19:10 Pratyush

The wrapper also implies a cost though: now accessing tx_proof requires you to use .0 to access the element inside the wrapper.

I feel maybe a good idea is to be compatible with serde, but extends it a little bit. That is, we can add "annotations" like:

#[serde_preferences(compressed, checked)]

I need to study serde a little bit, maybe it is already powerful enough to support different extensions.

weikengchen avatar Oct 05 '21 19:10 weikengchen

I think the idea would be that Transaction is a network-facing struct, which would be converted to something more user friendly before using in the rest of the system.

Pratyush avatar Oct 05 '21 19:10 Pratyush

@weikengchen It's perfectly OK to have a special-purpose API for special-purpose things. Then what serde makes you define as an impl on a commonly-used type is just a default. If your user wants to knowingly call into a non-default API compressed_checked_deserialize, they also can. If they want to use that to override the default when your struct is used in a certain context, such as when it is embedded in another specific struct, they can use e.g. serialize_with to implement that override and bypass the default trait implementation.

huitseeker avatar Oct 05 '21 21:10 huitseeker

I feel that there is a consensus to have some support for serde as long as it is doable, and we just need a plan to merge the API as well as keep the same functionalities and friendliness.

weikengchen avatar Oct 05 '21 21:10 weikengchen

Very useful discussion! I think add serde support is an engineering-friendly thing. That makes library users convenient.

sunhuachuang avatar Oct 06 '21 02:10 sunhuachuang

For ark 0.4, we can use as below:

use ark_serialize::{CanonicalDeserialize, CanonicalSerialize, Compress, Validate};

fn ark_se<S, A: CanonicalSerialize>(a: &A, s: S) -> Result<S::Ok, S::Error> where S: serde::Serializer {
      let mut bytes = vec![];
      a.serialize_with_mode(&mut bytes, Compress::Yes).map_err(serde::ser::Error::custom)?;
      s.serialize_bytes(&bytes)
}

fn ark_de<'de, D, A: CanonicalDeserialize>(data: D) -> Result<A, D::Error> where D: serde::de::Deserializer<'de> {
      let s: Vec<u8> = serde::de::Deserialize::deserialize(data)?;
      let a = A::deserialize_with_mode(s.as_slice(), Compress::Yes, Validate::Yes);
      a.map_err(serde::de::Error::custom)
}

#[derive(Deserialize, Serialize)]
pub struct Example {
    #[serde(serialize_with = "ark_se", deserialize_with = "ark_de")]
    a: Fr,
    #[serde(serialize_with = "ark_se", deserialize_with = "ark_de")]
    b: Vec<Affine>,
}

This is also a very convenient way, so I think we can close this issue.

sunhuachuang avatar Feb 02 '23 06:02 sunhuachuang

This is true, but it only is valid for new structs defined by dependent libraries rather than using a Serialize definition for a native arkworks structure.

The C-COMMON-TRAITS API guidelines expand on the reasons for why having those implementations is valuable - and getting those implementations was at the core of the scope of this issue. It's unfortunate that it's closed.

huitseeker avatar Feb 02 '23 20:02 huitseeker

It's all moot anyways: ark-serialize impls should handle the downgrade from full subgroup checks to on-curve checks plus appropriate cofactor calls. An arkworks ed25519 implementation would serialize correctly using sunhuachuang's example code, but cannot deserialize correctly using any deeper serde integration.

It's not only ed25519 since "jubjub" curves often have small cofactors. I warned vaguely about this sort of thing above, but never realized how wrong a deep serde integration already is. I'd say this issues looks dead.

burdges avatar Feb 02 '23 22:02 burdges