materialize [Epic] Improve interactive query times

Interactive queries (SELECTs that are not able to use the fast path) are possibly slower than we would like. This epic track improving interactive query times.

This epic focuses on improvements that can be done within the COMPUTE layer, but may provide findings that can be useful for performance improvements related to other layers, e.g., ADAPTER and COMPUTE.

Goals

Make interactive queries reasonably fast, as long as this is possible through improvements in the COMPUTE layer.

Acceptance Criteria

We have defined performance targets for interactive queries.
Interactive queries are fast enough, according to our targets.
Interactive queries still function correctly.

Tasks

[x] https://github.com/MaterializeInc/materialize/issues/13885
[x] https://github.com/MaterializeInc/materialize/issues/13887
[ ] Test interactive query performance
- [ ] Define test cases and objectives: https://www.notion.so/materialize/a489a8c3374e46cda81f1e2892ead59a?v=ee8c363052ba43db8c1e75d85dda32d2&p=f99a5ab89ca44555a5734431de385081&pm=s
- [ ] Perform measurements and analyses of query times
  - [ ] Create playbook document for approaching related performance improvement requests
  - [ ] Create issues for further work on performance improvements.
[ ] Identify and implement performance improvements still in scope of epic

Testing

Bugs

Questions

What are @frankmcsherry's insights on speeding up interactive queries?

Jun 16 '22 08:06 uce

Imo we should:

Make sure we can measure interactive query times. We have introspection views for peek latencies, and tbh these haven't usually been very high. They are often potentially deceptively small, in fact. So, I would first make sure we can track the end-to-end latency for compute providing answers to peeks.
Some amount of end-to-end latency is data availability. For reasons of consistency, ADAPTER may pose queries at times not yet available in the data. We needn't beat ourselves up over these; they cannot go fast with the parameters chosen. Ideally, we could break down the end-to-end latencies and report when data were first available.

Having done those, I think what @uce might be getting at is that we currently deploy SELECT ... queries as standing dataflows whose answers we peek at once they are ready. The could perhaps be more efficiently deployed as dataflows whose inputs we terminate immediately (once all relevant data have passed through), avoiding any subsequent maintenance work that won't contribute to the SELECT result. This also has the potential to simplify some of the operators (joins are able to discard their input arrangements if the opposing input has run dry; this can dramatically improve running times, or harm them if the dropping work is done instead of just producing the output; we'll have to see).

Jun 16 '22 15:06 frankmcsherry

As discussed in a meeting, one thing we might do is introduce "source frontiers" like we have "materialization frontiers", reporting for each source the frontier through which we have received data. This would allow COMPUTE to more specifically report on whether it is in a position to respond to an "interactive query" and from which moment might the latency be attributed to COMPUTE rather than ADAPTER (timestamp too far in the future) or STORAGE (data not yet durable).

Jun 28 '22 14:06 frankmcsherry

Related tickets from times past:

#6738
#3740

Aug 02 '22 10:08 philip-stoev

Tickets about loss of interactivity due to issues in the dataflow layer. I guess those also count towards improved interactive query times:

#8853
#7577
#7575

Aug 03 '22 08:08 philip-stoev

materialize materialize copied to clipboard

[Epic] Improve interactive query times

Goals

Acceptance Criteria

Tasks

Testing

Bugs

Related

Questions

materialize
materialize copied to clipboard