sp1
sp1 copied to clipboard
Implement an optimized SP1 Serializer
Right now, when a user does sp1_zkvm::io::read::<T>(...)
, underneath the hood we use bincode
as our serializer to read from the buffer. Reference is here.
However using bincode
as a serializer is inefficient, since we don't want any compression and we don't want to pay zkVM cycles for reading in data.
We should implement our own optimized zkVM serializer that just serializes/deserializes data ideally in u32
aligned chunks.
@puma314 assign this to me
Comparison of deserialization strategies
Summary
During our performance testing and benchmarking of deserialization processes, we observed a consistent minimum of approximately 4.6 million x86 cycles consumed when using rkyv, regardless of the data size. We compare this with bincode results and explore the performance of a pointer cast operation.
Details
In our benchmarks using the rkyv and bincode libraries for deserialization, we've noted that the number of cycles consumed with rkyv does not scale with data size, showing a fixed minimum cycle count suggestive of zero-copy deserialization mechanics.
Cycle Count Comparison
The following table summarizes the cycle counts observed for deserialization using transmute, rkyv, and bincode across various data sizes:
Data Size | transmute | rkyv | bincode |
---|---|---|---|
Small (100KB) | 607 | 4,741,032 | 18,479,850 |
Medium (5MB) | 548 | 4,990,638 | 681,913,198 |
Large (50MB) | 762 | 4,638,140 | 6,730,678,512 |
Interpretation
- Transmute: Shows minimal cycle counts across all data sizes, indicating a highly efficient direct memory mapping process.
- rkyv: Displays consistent cycle counts around 4.6 million, supporting the hypothesis of a fixed minimum cycle count due to zero-copy mechanics.
- bincode: Cycle counts increase significantly with data size, indicating less efficiency for larger datasets.
Additional Findings: Pointer-Cast Deserialization
In another series of tests, deserializing data via a simple pointer cast to a struct with no heap-allocated data, we observed a dramatically reduced cycle count of roughly 600 cycles per 100,000 bytes, for the following simple struct:
#[derive(Debug)]
struct TestStruct {
int: u8,
string: [u8; 24], // Mocking a fixed-size array instead of a String
data: [u8; 100_000], // Mocking a fixed-size array instead of a Vec<u8>
}
This method is highly efficient but limited to scenarios where data structures are compatible with direct memory mapping. Despite also being cartoonishly dangerous, the prospect of completely zero-cost deserialzation in the zkVM may be an avenue worth exploring. See this for further reading and exploration of this technique.
Further, the include_bytes_aligned!
macro may also satisfy the requirement to read machine words with u32 alignment from bytes embedded within the RISCV binary.
Efficiency Considerations for Smaller Datasets
While rkyv demonstrates excellent efficiency for large datasets with its low and consistent cycle count, it's important to note that bincode may be more efficient for scenarios where deserialization consumes less than ~4.6 million cycles. This is particularly relevant for smaller types where the overhead of rkyv's zero-copy mechanics does not offset its benefits.
Proposed Strategy
Given the differences in performance characteristics, a possible strategy is to offer different ways to read and write data into the zkVM, best suited for the task at hand:
- Use bincode for smaller types and data where the lower absolute number of cycles leads to faster operations.
- Use rkyv for larger types and data or applications
This approach allows users to optimize their applications by selecting the most appropriate serialization method, potentially improving performance and efficiency across various use cases.