materialize icon indicating copy to clipboard operation
materialize copied to clipboard

adapter/compute: Optimize environmentd handling of PeekResponse

Open ParkMyCar opened this issue 10 months ago • 2 comments

This PR optimizes PeekResponses by replacing Vec<Row> with a new RowCollection type which stores a collection of Rows in a contiguous blob. It makes the following changes:

  1. PeekResponse generally requires less memory because we only allocate the exact amount needed to encode a row and only have 8 bytes of overhead for the offset. Opposed to 24 bytes for a Row which will store 23 bytes inline. For small rows (e.g. a single int) this leads to possibly wasted space, and for large rows that spill to the heap it requires 24 bytes of overhead.
  2. ProtoPeekResponse now sends a blob of bytes (i.e. a Rust bytes::Bytes) which are a contiguous blob of Rows encoded in their own wire format. This makes handling decoding the rows from a PeekResponse zero-copy. Previously we were using Vec<ProtoRow> which resulted in multiple allocations per-Row.
  3. RowSetFinishing no longer "explodes" Vec<(Row, Diff)> tuples into a Vec<Row>, we operate on the (Row, Diff) format.
  4. RowSetFinishing creates a "sorted view" on top of the binary data from the network response, preventing the need to copy or move any of the row data itself.
  5. pgwire and http protocol layers consume a newly defined RowIterator instead of Vec<Row>.

As far as I can tell, handling a PeekResponse now only requires 6 allocations:

  1. Creating a Vec<(usize, NonZeroUsize)> for offsets and diffs when deserializing a ProtoPeekResponse
  2. Creating a correctly sized Vec<usize> when creating a sorted view
  3. Creating two DatumVecs, also for when we create our sorted view
  4. Optionally creating a Row and DatumVec which are reusable buffers when projecting columns.

Previously the number of allocations was O(n) where n is the number of rows in a response

Motivation

Reduce the amount of memory used by environmentd when handling a PeekResponse

Fixes https://github.com/MaterializeInc/materialize/issues/26712

Tips for reviewer

This PR ended up being bigger than I thought it would be, so I split into 4 commits which can be reviewed independently:

  1. Introduce a RowRef type, where RowRef:Row::str:String. I added a RowRef type because we needed an interface that would allow us to reference as slice of bytes (i.e. &[u8]) as row data.
  2. Introduce the RowCollection type and RowIterator trait. RowCollection is the contiguous slice of bytes that we read RowRefs out of. Then RowIterator (as described in a doc comment) is a "lending iterator" for Rows.
  3. Changes to compute code to return a RowCollection in PeekResponse
  4. Changes to adapter code to use a Box<dyn RowIterator> instead of Vec<Row>.

Checklist

  • [ ] This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • [ ] This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • [ ] If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • [x] This PR includes the following user-facing behavior changes:
    • Performance improvements for handling query responses.

ParkMyCar avatar Apr 22 '24 02:04 ParkMyCar

Risk Score:81 / 100 Bug Hotspots:4 Resilience Coverage:50%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • [x] (Required) Code Review 🔍 Detected
  • [ ] (Required) Feature Flag
  • [x] (Required) Integration Test 🔍 Detected
  • [ ] (Required) Observability
  • [x] (Required) QA Review 🔍 Detected
  • [ ] (Required) Run Nightly Tests
  • [x] Unit Test 🔍 Detected
Risk Summary:

The pull request carries a high risk, with a score of 81, indicating a significant likelihood of introducing bugs. This assessment is driven by predictors such as the sum of bug reports of files affected and changes in executable lines of code. Historically, pull requests sharing these characteristics have been 114% more likely to lead to bugs compared to the repository's baseline. Additionally, the PR touches 4 files that are considered hotspots for bugs, which further contributes to the risk. While the repository's bug trends are predicted to decrease, this PR still poses a substantial risk based on historical data and current changes.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots: What's This?

File Percentile
../src/util.rs 91
../sequencer/inner.rs 95
../src/compute_state.rs 97
../inner/create_materialized_view.rs 91

shepherdlybot[bot] avatar Apr 26 '24 16:04 shepherdlybot[bot]

This is fantastic work. I'm not going to have time to even glance at the code, but I read the description and I'm very excited about the optimizations you've made here.

benesch avatar Apr 26 '24 17:04 benesch

I triggered a nightly build:

  • the Postgres inconsistency failure is unrelated
  • testdrive w/ 4 replicas has some issues but I would assume that they are unrelated to the changes; please rebase on main to confirm

nrainer-materialize avatar May 07 '24 12:05 nrainer-materialize