sui
sui copied to clipboard
[sui_archival] File disk format for archival storage of ledger history and state snapshot
The following PR describes the storage format for storing ledger history (txns, effects, etc) as well as state snapshot (objects). More details on the design can also be found here
The latest updates on your projects. Learn more about Vercel for Git ↗︎
4 Ignored Deployments
| Name | Status | Preview | Comments | Updated |
|---|---|---|---|---|
| explorer | ⬜️ Ignored (Inspect) | Feb 1, 2023 at 9:49PM (UTC) | ||
| explorer-storybook | ⬜️ Ignored (Inspect) | Feb 1, 2023 at 9:49PM (UTC) | ||
| frenemies | ⬜️ Ignored (Inspect) | Feb 1, 2023 at 9:49PM (UTC) | ||
| wallet-adapter | ⬜️ Ignored (Inspect) | Feb 1, 2023 at 9:49PM (UTC) |
Thanks for feedback @mystenmark . To answer your overall question - I think the main motivation for building this was to be able to stream payloads from remote storage at various start and end offsets without downloading the whole file (one checkpoint directory contains blobs for multiple checkpoints). This could get accomplished with serde.rs perhaps on local disk but I am not sure how to make it work with s3 like apis (S3 has apis to stream file starting at an offset). Besides, serde.rs will serialize to a format which is strongly tied to rust (although i understand bincode is rust too but some node operators may still want a service written in a different language which can parse these files and serve blobs). The proposed blob format is clean and simple to understand, has provision for data integrity checks which are not going to be there in serde.
Thanks for feedback @mystenmark . To answer your overall question - I think the main motivation for building this was to be able to stream payloads from remote storage at various start and end offsets without downloading the whole file (one checkpoint directory contains blobs for multiple checkpoints). This could get accomplished with serde.rs perhaps on local disk but I am not sure how to make it work with s3 like apis (S3 has apis to stream file starting at an offset). Besides, serde.rs will serialize to a format which is strongly tied to rust (although i understand bincode is rust too but some node operators may still want a service written in a different language which can parse these files and serve blobs). The proposed blob format is clean and simple to understand, has provision for data integrity checks which are not going to be there in serde.
If we are looking for a format that we are already tied to, and we must maintain for anything to work, then we can serialize the structures inside the blobs as BCS. We already use this for our hashing / signatures so the need to maintain it is not new. Storing BCS also makes it easier to compute hashes without re-serializing, but that is a different concern.
Thanks for feedback @mystenmark . To answer your overall question - I think the main motivation for building this was to be able to stream payloads from remote storage at various start and end offsets without downloading the whole file (one checkpoint directory contains blobs for multiple checkpoints). This could get accomplished with serde.rs perhaps on local disk but I am not sure how to make it work with s3 like apis (S3 has apis to stream file starting at an offset). Besides, serde.rs will serialize to a format which is strongly tied to rust (although i understand bincode is rust too but some node operators may still want a service written in a different language which can parse these files and serve blobs). The proposed blob format is clean and simple to understand, has provision for data integrity checks which are not going to be there in serde.
If we are looking for a format that we are already tied to, and we must maintain for anything to work, then we can serialize the structures inside the blobs as BCS. We already use this for our hashing / signatures so the need to maintain it is not new. Storing BCS also makes it easier to compute hashes without re-serializing, but that is a different concern.
Agreed, storing the blobs as BCS makes sense because of it canonical serialized representation.