aptos-core
aptos-core copied to clipboard
[State Sync] Add data compression to state sync.
Description
This PR adds data compression to state sync (i.e., it utilizes the recently added aptos-compression
crate: https://github.com/aptos-labs/aptos-core/pull/2232). At a high-level:
- Storage service clients will specify (in the
StorageServiceRequest
) whether or not the response should be compressed. Compression can be configured via theAptosDataClientConfig
. The default is true. - If the client requests compression, the server will compress the response (see
StorageServiceResponse
). - Compression happens at the application layer (instead of at the network layer). There's obviously trade-offs with this approach, but the benefits are: (i) clients can enable/disable compression easily; (ii) the storage service can use cached responses to serve directly to clients, without needing to re-compress data for every request; and (iii) we can selectively compress data, e.g., there's no need to compress all messages, because some are tiny and might actually become larger (like the actual client requests themselves).
A couple notes:
- Most of this PR is just updating the tests and adding new ones. At the core, we're essentially just making the existing
StorageServiceRequest
andStorageServiceResponse
messages enums (to support compression). - This relates to: https://github.com/aptos-labs/aptos-core/issues/541
- We've forked the lz4 and rust-rocksdb crates to workaround temporary issues: https://github.com/rust-rocksdb/rust-rocksdb/issues/666. The forks will be removed once our PRs are accepted π
Test Plan
New tests have been added, including:
- Two smoke tests (for validators and fullnodes, testing no compression). All other smoke tests use compression by default.
- Several unit tests (e.g., to check that the config values work as expected and that client-server interaction is valid).
:white_check_mark: Forge test success on ded7a893d01acaab9946c0a3a7263f2855725041
all up : 5224 TPS, 2097 ms latency, 2900 ms p99 latency,no expired txns
- Grafana dashboard
- Validator 0 logs
- Humio Logs
- Test runner output
- Test run 1 is land-blocking
@JoshLind - Can we summarize the improvements we saw - may be volume of the data as well as the speed of state sync after compression is enabled?
@sitalkedia:
Can we summarize the improvements we saw - may be volume of the data as well as the speed of state sync after compression is enabled?
Cool π, added a small comment to the PR description, but I didn't want to make it too prominent given that the experimentation is highly anecdotal. I'm much more interested in seeing how this works in the real world π
@JoshLind - Do you plan to add compression support to consensus as well? I believe consenses messages, particularly proposals can benefit a lot from compression.
@sitalkedia:
Do you plan to add compression support to consensus as well? I believe consenses messages, particularly proposals can benefit a lot from compression.
Yep, already on it and have a working PR π Just need to land this first (need to work with @msmouse to get around a tricky situation with upgrading rust-rocksdb
and our execution benchmarks).
:white_check_mark: Forge test success on 07ac136807d7cbc4367cea8ff80680382eaf8547
all up : 6741 TPS, 4378 ms latency, 8150 ms p99 latency,no expired txns
- Grafana dashboard
- Validator 0 logs
- Humio Logs
- Test runner output
- Test run 1 is land-blocking
@JoshLind, do we have a timeline on when we can cut away from this? In my testing, pulling all the git repos and submodules seems to add (at least locally, though my internet is pretty fast) about 2 minutes to the build.
do we have a timeline on when we can cut away from this? In my testing, pulling all the git repos and submodules seems to add (at least locally, though my internet is pretty fast) about 2 minutes to the build.
@banool, do you have more information?
- If you're asking about compression (generally), I don't see us moving away from this π But, to get this PR to work, we did have to fork 2 crates (one of which is
rust-rocksdb
, which might be the culprit for the additional build time...). - We're waiting on outstanding PRs to the upstream crates to land (to kill our forks), but I'm not sure if that'll even help the build time unless we have a breakdown of where the overhead is coming from?