algebra Celo: Serialization changes

Description

Copies celo's serialization changes from #127

The key change is it adds serialize_uncompressed_unchecked.

Currently the implication of serialize_unchecked is to serialize in the uncompressed form, so this PR makes sense in the context of needing to serialize unchecked, compressed data.

The usecase for that is not clear to me, as the check needed for deserializing an affine curve point on a prime order curve is effectively the same as getting the affine representation of the point. (We currently don't implement arithmetic in montgomery form AFAIK)

Is the usecase for skipping subgroup checks for compressed curve points?

@jon-chuang, @Pratyush, is the above correct for where serializing unchecked compressed comes in handy?

I'll handle fixing the merge conflicts, and copying over the changes to make elliptic curves implement this if we agree on the need for the change.

We also have the question of whether the default should be for serialize_unchecked to be compressed or uncompressed. The current form of this PR is a breaking change that will fail silently in many places, by changing the implementation of serialize_unchecked.

Not sure what the best API to be using here right now is.

Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why.

[x] Targeted PR against correct branch (main)
[ ] Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
[ ] Wrote unit tests
[ ] Updated relevant documentation in the code
[ ] Added a relevant changelog entry to the Pending section in CHANGELOG.md
[ ] Re-reviewed Files changed in the Github PR explorer

Dec 13 '20 07:12 ValarDragon

Is the usecase for skipping subgroup checks for compressed curve points?

Yes, this is the intended usecase, I believe. I’m not sure what the overhead of decompression vs subgroup checks is for common curves

Dec 13 '20 17:12 Pratyush

@jon-chuang Could you elaborate further on the applications/potential use cases for this? Isn't taking the square root pretty expensive?

Dec 14 '20 17:12 Pratyush

Hi, yes, the use case is indeed to skip the subgroup check. We can use a batched subgroup check instead which is about 30-75x faster, iirc.

Dec 15 '20 02:12 jon-chuang

It would be nice to specialise the impl for vec<T> or [T] where T: AffineCurve etc. to perform the batch subgroup check. Perhaps a change for the future.

Dec 15 '20 02:12 jon-chuang

Hmm since it's useful, I propose renaming (de)serialize_unchecked to (de)serialize_compressed_unchecked. I am undecided on whether we should also rename serialize to serialize_compressed in the process. WDYT @ValarDragon @jon-chuang @kobigurk?

Dec 16 '20 03:12 Pratyush

Upon further thought, I personally dont like the idea of having checked and unchecked in the API, and am more in favour of one of the two solutions:

Remove such checks entirely, then the responsibility/manner of checking falls on the programmer
create a trait for each type representing the batch version of checking functions, with a default impl doing nothing, and impl for curves doing a batch check, and have this function invoked for the slice and vec data types. Well, this is probably pretty orthogonal to the question though.

Dec 16 '20 03:12 jon-chuang

Remove such checks entirely, then the responsibility/manner of checking falls on the programmer

I feel like this is probably going to lead to protocol problems elsewhere, and furthermore breaks the contracts of the curve types (which assume that the input is in the prime order subgroup)

Dec 16 '20 03:12 Pratyush

We could try playing around with adding a default parameter to the serialize and serialize_uncompressed methods, so that by default it performs the checks, but if you change the default type parameter to a different one (which avoids the checks), then it does not perform on-curve checks.

Dec 16 '20 03:12 Pratyush

Hmm yeah, I guess the solution proposed is meant to allow for minimal breakage.

I guess compression and data integrity checks do form a reasonable quadrant on the type of (de)serialisation one would like generically speaking. One however only cares about toggling data integrity checks if they are relatively expensive, so they matter little to e.g. a bool deserialization.

Would you mind giving an example of the type parameter method?

Dec 16 '20 03:12 jon-chuang

we could separate the unchecked serialization methods into its own trait, that is auto-implemented if canonical serialize is implemented

Dec 16 '20 03:12 ValarDragon

Im in favor of the approach of having 4 methods

* serialize_compressed
* serialize_uncompressed
* serialize_compressed_unchecked
* serialize_uncompressed_unchecked

I think its fine to expose the four options here, and let downstream applications decide which method they want, and which methods they want to expose.

Dec 16 '20 03:12 ValarDragon

Hmm since it's useful, I propose renaming (de)serialize_unchecked to (de)serialize_compressed_unchecked. I am undecided on whether we should also rename serialize to serialize_compressed in the process. WDYT @ValarDragon @jon-chuang @kobigurk?

Wrt to this question, are you proposing to have e.g. deserialize_compressed.. alongside deserialize_uncompressed.., only for the unchecked version of the methods?

Dec 16 '20 03:12 jon-chuang

Im in favor of the approach of having 4 methods
* serialize_compressed
* serialize_uncompressed
* serialize_compressed_unchecked
* serialize_uncompressed_unchecked
I think its fine to expose the four options here, and let downstream applications decide which method they want, and which methods they want to expose.

Hmm, well barring curves, most types do not have an option to compress, so shouldn't we have a default that leads to minimal breakage?

Dec 16 '20 03:12 jon-chuang

Here's some example code, that allows incorporating the batch checks as well:

trait CanonicalDeserialize {
	Specifies how the check should be performed
	type Validator: Checker<Self>;
	
	fn deserialize<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get item from R
		if C::should_check() {
			assert!(Self::Validator::check(&[item]));
		}
		item
	}

	
	fn deserialize_uncompressed<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get uncompressed item from R
		if C::should_check() {
			assert!(Self::Validator::check(&[item]));
		}
		item
	}
}

/// Actually performs the check
trait Checker<T> {
	fn check(&[T]) -> bool;
}

trait ShouldCheck {
	fn should_check() -> bool;
}

struct AlwaysCheck;

impl ShouldCheck for AlwaysCheck {
	fn should_check() -> bool {
		true
	}
}

struct NoCheck;

impl ShouldCheck for NoCheck {
	fn never_check() -> bool {
		false
	}
}

Here's how you would use it to do both checked and unchecked serialization:

let item = T::deserialize(&mut reader)?;
let item = T::deserialize::<_, NoCheck>(&mut reader)?;

and here's how you would use the Checker API to implement batch checking without specialization


impl<T: CanonicalDeserialize> CanonicalDeserialize for Vec<T> {
	type Validator = VecChecker<T>;
	
	fn deserialize<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get items from R
		if C::should_check() {
			assert!(T::Validator::check(items.as_slice()));
		}
		item
	}
}

struct VecChecker<T>(PhantomData<T>);
impl<T: CanonicalDeserialize> Checker<Vec<T>> for VecChecker<T> {
	fn check(items:&[Vec<T>]) -> bool {
		for item in items {
			T::check(item.as_slice())
		}
	}
}

The idea would be to have different implementations of checkers for different types.

Dec 16 '20 03:12 Pratyush

Note that the changes relating to the Validator associated type are completely orthogonal to the changes related to the ShouldCheck trait; we could adopt either, neither, or both.

Dec 16 '20 03:12 Pratyush

Here's some example code, that allows incorporating the batch checks as well:

I like this strategy as it exposes optional fine grained control with fewer breaking changes and more separation of duties.

I don't, however, like the use of type parameters as an optional argument...

Dec 16 '20 03:12 jon-chuang

Instead of the above approach, I propose instead to have a trait with overridable default impls

trait ValidData {
    fn validate(value: Self) -> Result {
        Ok(())
    }
    fn validate_batch(values: &[Self]) -> Result {
        for value in values {
            Self::check(value)?;
        }
        Ok(())
    }
}

Then one writes for impl<T: ValidData> CanonicalDeserialize for Vec<T>

fn deserialize<.., C: ..>(...) {
   for ... {
       deserialize(..); // use default
   }
   if C::should_validate() {
       T::validate_batch(values)?;
   }
}

Also validate sounds more precise.

Dec 16 '20 04:12 jon-chuang

Hi, @Pratyush @ValarDragon, could I get permission to push to non-protected branches?

Dec 21 '20 02:12 jon-chuang

Btw, where is serialize_derive actually used? I can't seem to find it used anywhere. For instance, derive_canonical_deserialize isn't.

Dec 21 '20 02:12 jon-chuang

The serialize-derive repo is often used in a project by adding a dependency on serialize with the feature derive. https://github.com/arkworks-rs/algebra/blob/master/serialize/src/lib.rs#L19

This allows the Rust to derive the canonical serialization and deserialization. e.g., https://github.com/arkworks-rs/poly-commit/blob/master/src/data_structures.rs#L107

Dec 21 '20 02:12 weikengchen

By the way, thanks a lot for the serialization system by Celo. It has helped one of my projects significantly!

Dec 21 '20 02:12 weikengchen

Hi @weikengchen thanks, I guess it's not used in algebra itself, with arrays and simple structs. I see that it's necessary for cases where there are nested structs.

That's great, but according to the above, though, the API seems like it's about to change again.

Dec 21 '20 03:12 jon-chuang

@Pratyush unfortunately it seems that your strategy doesn't work, becuase: defaults for type parameters are only allowed in struct, enum, type, or trait definitions. So I'm still looking for a nice workaround to specify default behaviour without introducing breaking changes and/or clutter.

One workaround is to have the traits Canonical(de)Serialize take a type parameter. However, this is obviously not flexible enough as within the same file, one might want to invoke e.g. serialize both with and without validation.

While ugly, it does seem that exposing the 4 functions with _unchecked is the most straitforward way to implement the desired functionality. Alternatively, while introducing breaking changes, a cleaner method might be to create an optional enum selector Option<SerializationMode> (encompassing both compression and validation procedures). While on the semantic level, rustc understands this as invoking a mode at runtime, I believe LLVM will inline the branches based on the mode invoked.

Dec 21 '20 03:12 jon-chuang

@weikengchen actually with regards to your example https://github.com/arkworks-rs/poly-commit/blob/master/src/data_structures.rs#L107, it does seem one can get rid of the ugly code bloat by implementing canonical (de)serialize for the String type.

Dec 21 '20 03:12 jon-chuang

Yes. I will take a look!

Dec 21 '20 04:12 weikengchen

Things I am currently unsure about: how to impl derive to incorporate (de)serialize_uncompressed_unchecked.

Dec 21 '20 07:12 jon-chuang

Is the problem due to whether it should default to (de)serialize_uncompressed or (de)serialize_unchecked when a specific implementation is not given?

Dec 21 '20 07:12 weikengchen

btw Jon, an alternative to prevent downstream users from having to implement all methods manually would be to have a "Mode", as outlined in https://github.com/arkworks-rs/algebra/issues/145

The idea is to have a mode enum:

enum Compress {
	Yes,
	No,
}
enum Validate {
	Yes,
	No
}

And this enum is passed into a single method

trait Serialize {
	fn serialize_with_mode<R: Read>(self, reader: R, compression: Compression, validation: Validation) -> ioResult;
	
	fn serialize_compressed(...) {
		self.serialize_with_mode(reader, Compress::Yes, Validate::Yes)
	}
	
	fn serialize_compressed_unchecked(...) {
		self.serialize_with_mode(reader, Compress::Yes, Validate::No)
	}
	
	fn serialize_uncompressed(...) {
		self.serialize_with_mode(reader, Compress::No, Validate::Yes)
	}
	
	fn serialize_uncompressed_unchecked(...) {
		self.serialize_with_mode(reader, Compress::No, Validate::No)
	}
}

So users only have to implement the single method, and everything else is implemented by default?

This should also make writing the Derive macro implementation easier.

(This can be combined with the above Validator associated type idea, or your ValidData trait idea)

Dec 21 '20 07:12 Pratyush

Incorporating the ideas above, let me propose the following design:

pub enum Compress {
	Yes,
	No,
}
pub enum Validate {
	Yes,
	No,
}

pub trait Valid {
	fn check(&self) -> Result<(), SerializationError>;
	fn batch_check(batch: &[self]) -> Result<(), SerializationError> {
		for item in batch {
			item.check()?;
		}
		Ok(())
	}
}

pub trait CanonicalSerialize {
	fn deserialize_with_mode<W: Write>(writer: W, compress: Compress) -> Result<(), SerializationError> {
		// write item to W, handling compression as per `compress`
	}
	
	fn serialize_compressed(...) {
		self.serialize_with_mode(reader, Compress::Yes)
	}
	
	fn serialize_uncompressed(...) {
		self.serialize_with_mode(reader, Compress::No)
	}
}

pub trait CanonicalDeserialize: Valid {
	fn deserialize_with_mode<R: Read>(reader: R, compress: Compress, validate: Validate) -> Result<Self, SerializationError> {
		// get item from R, handling compression as per `compress`
		if let Validate::Yes = validate {
			Self::check(&item)?
		}
		item
	}
	
	fn deserialize_compressed(...) -> ... {
		Self::serialize_with_mode(reader, Compress::Yes, Validate::Yes)
	}
	
	fn deserialize_compressed_unchecked(...) -> ... {
		self.serialize_with_mode(reader, Compress::Yes, Validate::No)
	}
	
	fn deserialize_uncompressed(...) -> ... {
		Self::deserialize_with_mode(reader, Compress::No, Validate::Yes)
	}
	
	fn deserialize_uncompressed_unchecked(...) -> ... {
		Self::deserialize_with_mode(reader, Compress::No, Validate::No)
	}
}

impl<T: Valid> CanonicalDeserialize for Vec<T> {
	fn deserialize_with_mode<R: Read>(reader: R, compress: Compress, validate: Validate) -> Self {
		// get items: Vec<T> from R, handling compression as per `compress`
		if let Validate::Yes = validate {
			T::batch_check(&items)?
		}
		items
	}
}

Jul 13 '21 15:07 Pratyush

algebra algebra copied to clipboard

Celo: Serialization changes

Description

algebra
algebra copied to clipboard