sp1 icon indicating copy to clipboard operation
sp1 copied to clipboard

Implement an optimized SP1 Serializer

Open puma314 opened this issue 9 months ago • 1 comments

Right now, when a user does sp1_zkvm::io::read::<T>(...), underneath the hood we use bincode as our serializer to read from the buffer. Reference is here.

However using bincode as a serializer is inefficient, since we don't want any compression and we don't want to pay zkVM cycles for reading in data.

We should implement our own optimized zkVM serializer that just serializes/deserializes data ideally in u32 aligned chunks.

puma314 avatar May 13 '24 22:05 puma314

@puma314 assign this to me

malik672 avatar May 17 '24 15:05 malik672

Comparison of deserialization strategies

Summary

During our performance testing and benchmarking of deserialization processes, we observed a consistent minimum of approximately 4.6 million x86 cycles consumed when using rkyv, regardless of the data size. We compare this with bincode results and explore the performance of a pointer cast operation.

Details

In our benchmarks using the rkyv and bincode libraries for deserialization, we've noted that the number of cycles consumed with rkyv does not scale with data size, showing a fixed minimum cycle count suggestive of zero-copy deserialization mechanics.

Cycle Count Comparison

The following table summarizes the cycle counts observed for deserialization using transmute, rkyv, and bincode across various data sizes:

Data Size transmute rkyv bincode
Small (100KB) 607 4,741,032 18,479,850
Medium (5MB) 548 4,990,638 681,913,198
Large (50MB) 762 4,638,140 6,730,678,512

Interpretation

  • Transmute: Shows minimal cycle counts across all data sizes, indicating a highly efficient direct memory mapping process.
  • rkyv: Displays consistent cycle counts around 4.6 million, supporting the hypothesis of a fixed minimum cycle count due to zero-copy mechanics.
  • bincode: Cycle counts increase significantly with data size, indicating less efficiency for larger datasets.

Additional Findings: Pointer-Cast Deserialization

In another series of tests, deserializing data via a simple pointer cast to a struct with no heap-allocated data, we observed a dramatically reduced cycle count of roughly 600 cycles per 100,000 bytes, for the following simple struct:

#[derive(Debug)]
struct TestStruct {
    int: u8,
    string: [u8; 24],    // Mocking a fixed-size array instead of a String
    data: [u8; 100_000], // Mocking a fixed-size array instead of a Vec<u8>
}

This method is highly efficient but limited to scenarios where data structures are compatible with direct memory mapping. Despite also being cartoonishly dangerous, the prospect of completely zero-cost deserialzation in the zkVM may be an avenue worth exploring. See this for further reading and exploration of this technique.

Further, the include_bytes_aligned! macro may also satisfy the requirement to read machine words with u32 alignment from bytes embedded within the RISCV binary.

Efficiency Considerations for Smaller Datasets

While rkyv demonstrates excellent efficiency for large datasets with its low and consistent cycle count, it's important to note that bincode may be more efficient for scenarios where deserialization consumes less than ~4.6 million cycles. This is particularly relevant for smaller types where the overhead of rkyv's zero-copy mechanics does not offset its benefits.

Proposed Strategy

Given the differences in performance characteristics, a possible strategy is to offer different ways to read and write data into the zkVM, best suited for the task at hand:

  • Use bincode for smaller types and data where the lower absolute number of cycles leads to faster operations.
  • Use rkyv for larger types and data or applications

This approach allows users to optimize their applications by selecting the most appropriate serialization method, potentially improving performance and efficiency across various use cases.

References

Dustin-Ray avatar May 31 '24 02:05 Dustin-Ray