substrate icon indicating copy to clipboard operation
substrate copied to clipboard

arkworks integration

Open achimcc opened this issue 2 years ago • 16 comments

  • What does it do? Arkworks integration into Substrate. We provide Elliptic curves (bls12_381, bls12_377, ed_on_bls12_381, ed_on_bls12_77, bw6_761), were we replace the compute intense operations by host function calls in Substrate. We add those host function calls to sp_io::elliptic_curves and refer to the arkworks crate. The curves live in https://github.com/paritytech/ark-substrate) and call into the host functions. To avoid point preparation of the elliptic curves in the runtime, we also partially fork the models bls12 and bw6 in ark-substrate.

  • What important points should reviewers know? This is using the https://github.com/paritytech/ark-substrate repo which contains the curve and model code. ~~We cover tests in the arkworks-curves crate by usage of the arwkorks algebra test-templates.~~ Tests are now covered by pulling in https://github.com/paritytech/ark-substrate where we implement the arkworks curves which receive the sp-io host function implementations by dependency injection. We can re-use the arkworks-rs algebra test-templates for testing.

An overview and comparison of benchmark results can be found here.

  • Is there something left for follow-up PRs? It might be possible to avoid forking the models. Adding more curves and replacing more operations by host function calls would be further possible optimisations.

  • [x] Labels: You labeled the PR appropriately if you have permissions to do so:

    • [x] A* for PR status (one required)
    • [x] B* for changelog (one required)
    • [x] C* for release notes (exactly one required)
    • [x] D* for various implications/requirements
    • [x] Github project assignment
  • [x] Related Issues: You mentioned a related issue if this PR is related to it, e.g. Fixes #228 or Related #1337.

  • [x] 2 Reviewers: You asked at least two reviewers to review. If you aren't sure, start with GH suggestions.

  • [ ] Style Guide: Your PR adheres to the style guide

    • In particular, mind the maximal line length of 100 (120 in exceptional circumstances).
    • There is no commented code checked in unless necessary.
    • Any panickers in the runtime have a proof or were removed.
  • [ ] Runtime Version: You bumped the runtime version if there are breaking changes in the runtime.

  • [ ] Docs: You updated any rustdocs which may need to change.

  • [ ] Polkadot Companion: Has the PR altered the external API or interfaces used by Polkadot?

    • [ ] If so, do you have the corresponding Polkadot PR ready?
    • [ ] Optionally: Do you have a corresponding Cumulus PR?

achimcc avatar Dec 29 '22 18:12 achimcc

(Maybe) dumb questions:

  1. why the upstream version doesn't satisfy our use case? What patches were applied to the upstream? If I understood correctly some computational intesive tasks are offloaded to host functions? What are these tasks/functions?
  2. why we need to integrate all this stuff in Substrate? IMO (in general) is not the best to embed an entire lib in an (already giant) codebase
  3. where all this cryptographic thing required? I mean, I see that some host functions were implemented to use these primitives. But then as far as I can see there is no usage of these functions. What is the end goal?

As a side note, we don't directly implement any cryptographic primitive in Substrate, we use external crates.

davxy avatar Jan 07 '23 10:01 davxy

(Maybe) dumb questions:

  1. why the upstream version doesn't satisfy our use case? What patches were applied to the upstream? If I understood correctly some computational intesive tasks are offloaded to host functions? What are these tasks/functions?
  2. why we need to integrate all this stuff in Substrate? IMO (in general) is not the best to embed an entire lib in an (already giant) codebase
  3. where all this cryptographic thing required? I mean, I see that some host functions were implemented to use these primitives. But then as far as I can see there is no usage of these functions. What is the end goal?

As a side note, we don't directly implement any cryptographic primitive in Substrate, we use external crates.

  1. The operations on arithmetic curves are way to slow in WebAssembly. We have to replace the costly operations by host function calls. The host function tasks are the arithmetic operations on elliptic curves which are slow in WebAssembly. Those are: computation of biliner pairings (ate pairing), multi scalar multiplication, affine and projective multiplications.
  2. I'll implement it this way now. As explained above it is a lot more work since the curves have to call into host functions. If we want to make the host functions available to them, we have to pass them by dependency injection and that requires a lot of modifications on the arkworks code.
  3. Those are the curves used by arkworks.rs which make all the tools from this library available to every Substrate pallet. Direct examples are Zero Knowledge Proof verification, especially verification of plonk or groth16 proofs. I created a simple example of a groth16 proof verification in a Substrate pallet which uses these curves (which is based on my PR branch of Substrate) here: https://github.com/achimcc/substrate-groth16. Other applications include RingVRF's which are interesting in their own, but will also become a basic building block of Sassafrass consensus. @burdges might comment more on use-cases.

achimcc avatar Jan 08 '23 08:01 achimcc

@achimcc I'm very interested since I'm working on sassafras implementation :-)

  1. The "prove" component will always run in native client code in order to produce ring membership proofs (to be sent onchain along with tickets).
  2. The "verify" is indeed the critical component. Since it may be very computational intensive (and maybe also no no-std... but I don't know this for sure) and run by the runtime when checking for submitted tickets before saving them onchain.

But can't we offer a host function (i.e. verify) doing the overall verification job? I mean, why should we have to run part of the verification code in wasm at all? I think that a verify host function would be a good interface in order to maintain independence from the actual library. Is there some detail I'm missing that prevents to do it?

davxy avatar Jan 08 '23 10:01 davxy

@davxy We also want to use the arkworks.rs stuff in other context. @burdges said, a complete native code implementation of a verify function is not what we want, but I've to admit that I don't know the reasoning behind it.

achimcc avatar Jan 08 '23 11:01 achimcc

@burdges why we don't want a full native verify?

davxy avatar Jan 09 '23 08:01 davxy

@burdges why we don't want a full native verify?

AFAIK the idea is to use artworks as a building library to compose different kind of cryptography algorithms. Adding one verify for each cryptographic algorithm isn't such a great idea in the longterm.

bkchr avatar Jan 09 '23 09:01 bkchr

This all builds and all tests pass! If no one complains, I will just wait for the official arkworks-rs 0.4.0 release and will turn it into Ready for review then!

achimcc avatar Jan 13 '23 10:01 achimcc

We could've a thinner substrate change @bkchr but then have an external repo that implements the runtime code for the curve models and curves themselves. I didn't ask for this initially because I figured it'd complicate doing tests, but achim said that's not really true. I agree it's a good idea now that I think about it:

It's true, we need it to be easy to translate PRs from arkworks' to our forked code. It's easy-ish to do this for curves, which live in the flat repo https://github.com/arkworks-rs/curves/ so we could've a similar flat sub-ark-curves repo perhaps.

It's trickier to do curve models as they're nested into the ark-ec crate https://github.com/arkworks-rs/algebra/tree/master/ec/src/models so I figured this winds up being roughly git diff and git apply, but afaik the curve models could live in the same sub-ark-curves repo.

We'll upgrade only intermittently I guess, so really I think sub-ark-curves being separate from substrate itself actually benefits us in that parachains upgrade sub-ark-curves and all their crpytography dependencies independently form substrate itself. We could even have testing infrastructure that uses all supported versions of sub-ark-curves.


Arkworks has need generous in permitting our changes into their traits, and we've tried to keep our changes in line with what gpu and fpga projects require, but arkworks success requires being kinda a "standard library". We also want zcash's traits to work via these same host calls, but they'll never permit even the slightest code smell into their traits, so..

We should think carefully about any changes to the arkworks trait hierarchy. In fact, we almost got G2Prepared serialization merged, which risks incompatiblity with some optimizations we like.

burdges avatar Jan 13 '23 17:01 burdges

We could've a thinner substrate change @bkchr but then have an external repo that implements the runtime code for the curve models and curves themselves. I didn't ask for this initially because I figured it'd complicate doing tests, but achim said that's not really true. I agree it's a good idea now that I think about it:

It's true, we need it to be easy to translate PRs from arkworks' to our forked code. It's easy-ish to do this for curves, which live in the flat repo https://github.com/arkworks-rs/curves/ so we could've a similar flat sub-ark-curves repo perhaps.

It's trickier to do curve models as they're nested into the ark-ec crate https://github.com/arkworks-rs/algebra/tree/master/ec/src/models so I figured this winds up being roughly git diff and git apply, but afaik the curve models could live in the same sub-ark-curves repo.

We'll upgrade only intermittently I guess, so really I think sub-ark-curves being separate from substrate itself actually benefits us in that parachains upgrade sub-ark-curves and all their crpytography dependencies independently form substrate itself. We could even have testing infrastructure that uses all supported versions of sub-ark-curves.

We have a forked curve repo already which contains the forked arkworks curves: https://github.com/achimcc/ark-substrate

I'm using it for this PR!

achimcc avatar Jan 13 '23 17:01 achimcc

Ahh, you've even got the models there, cool.

burdges avatar Jan 13 '23 17:01 burdges

We have a funny testing duality: the curves repo is running the tests against the host functions in Substrate and Substrate is running the tests against the curve repo. In both cases we use the curves with host function implementations.

achimcc avatar Jan 13 '23 17:01 achimcc

Is there any need to have the host functions in substrate? From Substrate itself there is no need that the host functions are being part of this repo. Host functions can be defined anywhere.

bkchr avatar Jan 15 '23 22:01 bkchr

AFAIK Sassafrass consensus will use RingVRF's and with this also arkworks and the host functions. I was assuming that the consensus layer needs to be part of the Substrate core repo?

achimcc avatar Jan 16 '23 03:01 achimcc

Good point! If this is the case, we can add it here :)

A general node, the artworks stuff should go into some special "namespace/trait".

bkchr avatar Jan 16 '23 13:01 bkchr

Good point! If this is the case, we can add it here :)

A general node, the artworks stuff should go into some special "namespace/trait".

Done!

achimcc avatar Jan 16 '23 20:01 achimcc

Around Vec<Vec<u8>>, I'd likely have written something like this, maybe up-streamed it to arkworks, but not sure the best choice. In this, we risk reallocations, but that's likely better than all the allocations in Vec<Vec<u8>>.

fn serialize_iter_to_vec<T>(iter: impl IntoIter<Item=T>) -> Result<Vec<u8>,SerializationError> 
where T: CanonicalSerialize+Sized+Zero
{
    let iter = iter.into_iter();
    let element_size = Self::zero().uncompressed_size();  // Just a guess
    let length: u32 = iter.size_hint().0.try_into() ?;  // Just a guess
    let mut w = Cursor::new( Vec::with_capacity(8 + element_size * length_guess) )
    length.serialize_uncompressed(&mut w) ?;
    let mut length = 0u32;
    for elm in iter {
        elm.length.serialize_uncompressed(&mut w) ?;
        length += 1;
    }
    let result = w.into_inner();
    length.serialize_uncompressed(&mut result.as_mut()) ?;
    result
}

fn deserialize_iter_to_vec<T>(mut bytes: [u8]) -> Result<Vec<T>,SerializationError> 
where T: CanonicalDeserialize+Sized
{
    let length = u32::deserialize_uncompressed_unchecked( bytes ) ?
    let mut result = Vec::with_capacity(length as usize);
    for _ in 0..length {
        result.push( Vec::deserialize_uncompressed_unchecked( bytes )? );
    }
}

burdges avatar Mar 18 '23 15:03 burdges

Around Vec<Vec<u8>>, I'd likely have written something like this, maybe up-streamed it to arkworks, but not sure the best choice. In this, we risk reallocations, but that's likely better than all the allocations in Vec<Vec<u8>>.

fn serialize_iter_to_vec<T>(iter: impl IntoIter<Item=T>) -> Result<Vec<u8>,SerializationError> 
where T: CanonicalSerialize+Sized+Zero
{
    let iter = iter.into_iter();
    let element_size = Self::zero().uncompressed_size();  // Just a guess
    let length: u32 = iter.size_hint().0.try_into() ?;  // Just a guess
    let mut w = Cursor::new( Vec::with_capacity(8 + element_size * length_guess) )
    length.serialize_uncompressed(&mut w) ?;
    let mut length = 0u32;
    for elm in iter {
        elm.length.serialize_uncompressed(&mut w) ?;
        length += 1;
    }
    let result = w.into_inner();
    length.serialize_uncompressed(&mut result.as_mut()) ?;
    result
}

fn deserialize_iter_to_vec<T>(mut bytes: [u8]) -> Result<Vec<T>,SerializationError> 
where T: CanonicalDeserialize+Sized
{
    let length = u32::deserialize_uncompressed_unchecked( bytes ) ?
    let mut result = Vec::with_capacity(length as usize);
    for _ in 0..length {
        result.push( Vec::deserialize_uncompressed_unchecked( bytes )? );
    }
}

Okay, I replaced all usages of Vec<Vec<u8>> by Vec<u8> by using .chunks() and flat_map(). But I'm not convinced that this yields any performance improvement, since we need to split into chunks (after determining the chunk size) and then we need to convert each chunk into Vec<u8> by calling .to_vec().

achimcc avatar Mar 18 '23 17:03 achimcc

We do have equal chunk sizes when sending elliptic curve points or field elements, which makes sense for arkworks, but breaks in general, and thus feels fragile here too. I wrote *_iter_to_vec to work even if the serialized size winds up variable.

burdges avatar Mar 18 '23 21:03 burdges

We do have equal chunk sizes when sending elliptic curve points or field elements, which makes sense for arkworks, but breaks in general, and thus feels fragile here too. I wrote *_iter_to_vec to work even if the serialized size winds up variable.

I tried your approach, was not so easy to implement it in a generic way< I fannly reverted for now back to Vec<Vec<u8>>. I still think its the easiest and also fastest solution.

achimcc avatar Mar 19 '23 13:03 achimcc

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/zk-rollups-as-parachains/2229/13

Polkadot-Forum avatar Apr 05 '23 11:04 Polkadot-Forum

I'm now happy with this I think.. I'd like @swasilyev to take one more look too.

We could migrate this to BLST in future, but frankly I think speeding up arkworks makes more sense. It'd be good if someone like @drskalman reviewed the mappings in https://github.com/MystenLabs/fastcrypto/tree/main/fastcrypto-zkp/src to figure out if they represent any significant future changes. I don't think these hold up placing this into testing, but maybe another review before it goes to kusama of course.

burdges avatar May 01 '23 15:05 burdges

Would it be possible to use strong types istead of Vec<u8> at the sp-arkworks level? Currently sp-arkworks expects arguments to be SCALE-serialized. Typically we have serialization implemented at the runtime interface level.

arkpar avatar May 12 '23 11:05 arkpar

I still don't see any reason why we would need to merge this into main repo right now. For experimentation purposes it can be in its own repo.

bkchr avatar May 12 '23 19:05 bkchr

What would a separate repo look like? Can we have test nets running it?

At a high level, these are the "opinionated" decisions I made here:

  • We do no hostcalls below single scalar multiplications.
  • We provide single scalar multiplications in projective coordinates, but not affine coordinates (what ETH does). I asked folks if they foresaw projective ever changing, but crickets, so likely this incurs no maintenance burden. It might incur some tiny CPU time cost, but this remains invisible from the messy benchmarks. It's probably an optimization in general. It's probably fine otherwise due to..
  • We provide multi scalar multiplications with inputs in affine coordinates, which likely meshes better with how things get provided over the wire.
  • We've no deserialization subgroup checks in any of these calls yet, which imposes a hefty penalty upon the runtime, but afaik nothing fancy we'd do here would be universal anyways.
  • We provide pairings as a separated final exponentiation and Miller loop, which meshes well with how everything descendant from zcash works, and provides a natural approach. As Miller loop outputs do not lie in G_T, you cannot do optimized G_T arithmetic directly upon the results, but you can GF(q^k) arithmetic. We'll provide optimized G_T arithmetic soon too, but never optimize GF(q^k) arithmetic.
  • We incorporate the G2 point preparation into the Miller loop because the preparation output is 16kb. Also, substrate cannot yet pass in-memory unserializable data between entry PVF points. We should pass in-memory unserializable data between entry PVF points of course, ala https://github.com/w3f/PPPs/pull/11#issuecomment-1537339530, but even then doing point preparation under-the-hood still permits caching optimizations, albeit strange ones for determinism.

We've not yet done this part:

  • We should provide G_T arithmetic but basically it wants optimizations not available for GF(p^k) overall. We'd thus do G_T but not GF(p^k) host calls. I'd just like to understand the design space slightly better.

I judge my above answers become iffy on these points:

  • Should a G2 point passed into a Miller Loop be prefixed by a 0u8 so other flavors could exist later? I'd say add more host calls if we ever figure this out.
  • Should we adopt BLST initially? Ala https://github.com/MystenLabs/fastcrypto/tree/main/fastcrypto-zkp/src I'd vote no, upstream optimizations to Arkworks.

Achim could write the above into https://github.com/w3f/PPPs/pull/8 probably.

burdges avatar May 12 '23 21:05 burdges

It's anyways fine if we avoid this being reachable on kusama or polkadot by whatever means, like doing a separate test net. We should imho avoid doing bespoke calls for higher level stuff like the BLS in BEEFY or Sassafras though.

burdges avatar May 12 '23 21:05 burdges

By test net you mean a relay chain test network where Parachain validation can use these host functions? Or do you mean any kind of substrate based test network?

bkchr avatar May 12 '23 21:05 bkchr

Would it be possible to use strong types istead of Vec<u8> at the sp-arkworks level? Currently sp-arkworks expects arguments to be SCALE-serialized. Typically we have serialization implemented at the runtime interface level.

@burdges has implemented Ark-SCALE, I think we should move it to the SCALE Repo and use it here. We also need it for serializing the KZG commitment to the public keys in BEEFY.

drskalman avatar May 12 '23 22:05 drskalman

We're not worried about exposing these to PVFs right away.

We'll want them for PVFs eventually since that'll help promote polkadot, but how fast that switch gets flipped depends.

In particular, all these use variable time arithmetic, like every signature verification primitive on every blockchain. You'd typically compute weights using an expectation of scalars with half 1s and half 0s, but if someone gives you all 1s, then it'll take like 50% longer, so your weights maybe wrong, and your block maybe rejected. Imho this is yet another reason to use these sorts of primitive to build batch verifiers, which make achieving many 1s much harder.

burdges avatar May 12 '23 22:05 burdges

I think we should move it to the SCALE Repo and use it here. We also need it for serializing the KZG commitment to the public keys in BEEFY.

We already do but I think @arkpar means something serialized but representative of the semantics, like pub struct SerializedProjective(pub Vec<u8>). I've no opinion of this matters burred so deep, nor what you do to make scale handle it efficiently, but it's likely harmless if done right.

A questions is if there is something more efficient than Vec<u8>, like an actual &[u8] type, but often here we could not do borrows anyways since we have data presented by iterators or whatever.

burdges avatar May 12 '23 22:05 burdges

We already do but I think @arkpar means something serialized but representative of the semantics, like pub struct SerializedProjective(pub Vec<u8>).

My understanding is @arkpar prefers that if you are sending a projective point so the host api knows and expects a projective point like of type CurveGroup, and serialization completely being handled transparently by the fact that projective point (CurveGroup or AffineRepr implements SCALE). That is my understanding of stronger typing.

In BEEFY we directly include (G1Affine, G1Affine) in the BeefyAuthoritySet struct, for the commitment whereas we could have used (Vec, Vec) and not tell substrate anything about its internal.

drskalman avatar May 12 '23 22:05 drskalman