gnark icon indicating copy to clipboard operation
gnark copied to clipboard

perf: (de)serialization revisit

Open gbotrel opened this issue 2 years ago • 3 comments

Constraint system use cbor. Witness / Proving & Verifying keys use binary encoding.

For large circuits, deserialization cost is very visible; this topic need some careful revisit (maybe allow some unsafe options when portability is not important), with some forward thinking to ensure stability over the next releases.

gbotrel avatar Mar 14 '23 15:03 gbotrel

Cursory look, there are a lot of reasons not to use ASN1. The standard is complex, implementations are old and clunky. There even has been instances of malicious code injected into it with no one noticing because the code is so complex no one wants to look at it. It is even sometimes less performant than text-based encoding + compression.

Protobufs seems like a better option with one major downside: It doesn't handle deterministic serialization very well. As long as we are happy with not being able to use any of this serialized data in a checksum or fiatshamir challenge, we'll be good.

Tabaie avatar Mar 15 '23 16:03 Tabaie

Couple of things to keep in mind;

  • when we encode points, we want to have the option to encode them compressed or not (X coordinate + parity bit)
  • some libraries force a code-gen phase; which likely forces us to adapt our go-struct to avoid copies at serialization / deserialization
  • msgpack or flatbuffer could also be good candidates. Protobufs performed worsed than cbor for large circuits. Didn't even try it for things we do in binary (provingkeys, witness ...) .
  • in some scenarios, we may want 2 options; one that's safe & portable across the network, and one that's unsafe and very fast. Like dumping the memory to disk or doing mmap. Typically, in the zkEVM architecture, hot reload of the process using gnark is a thing, and it will need to read the proving key, the circuit very fast.

gbotrel avatar Mar 15 '23 18:03 gbotrel

Cursory look, there are a lot of reasons not to use ASN1. The standard is complex, implementations are old and clunky. There even has been instances of malicious code injected into it with no one noticing because the code is so complex no one wants to look at it. It is even sometimes less performant than text-based encoding + compression.

Protobufs seems like a better option with one major downside: It doesn't handle deterministic serialization very well. As long as we are happy with not being able to use any of this serialized data in a checksum or fiatshamir challenge, we'll be good.

I did ASN1 a lot a few years ago. encoding/asn1 is easier to use, but slow and can give unexpected results. golang.org/x/crypto/cryptobyte is better, but needs a lot of manual work so that it works nicely without excessive allocations. For stability and nice binary encodin ASN1 would be great, but it wouldn't just be a drop-in. A nuisance with ASN1 DER encoding is that we need to know the size of the fields before writing (the value header contains the length of the value) and this may make encoding large circuits difficult (cannot stream directly to disk/network).

Imo protobufs required some spec definition and code-generation. ? And I don't know how it is now, but at some point importing protobuf dragged in a lot of dependencies, making builds really slow. But this may have changed now. I would avoid protobuf if possible because we would first have to define the spec and then compile types out of it. Or we may have two different specifications of same abstract data structures (Protobuf def for serializing and internal def for operations), but it becomes really annoying to maintain.

I had also tried Capn'proto. It is claimed to be really fast, but its usage was difficult. Imo nothing was deserialized by default and only if you use some value, you'll deserialize it. The laziness deserialization was efficient, but also very error-prone.

To be honest, I like the two-tier approach:

  • for unsafe usage we just do a memory dump.
  • for safe usage we have some very simple binary marshalling/unmarshalling (a la "len(field) || field"). This allows us to be explicit about the binary formats. I think this actually won't be too much more overhead than trying to figure out cbor. I have spent days trying to understand how to write custom CBOR encoders/decoders for interface types.

ivokub avatar Mar 15 '23 22:03 ivokub