materialize
materialize copied to clipboard
[Epic] Improve interactive query times
Interactive queries (SELECT
s that are not able to use the fast path) are possibly slower than we would like. This epic track improving interactive query times.
This epic focuses on improvements that can be done within the COMPUTE layer, but may provide findings that can be useful for performance improvements related to other layers, e.g., ADAPTER and COMPUTE.
Goals
Make interactive queries reasonably fast, as long as this is possible through improvements in the COMPUTE layer.
Acceptance Criteria
- We have defined performance targets for interactive queries.
- Interactive queries are fast enough, according to our targets.
- Interactive queries still function correctly.
Tasks
- [x] https://github.com/MaterializeInc/materialize/issues/13885
- [x] https://github.com/MaterializeInc/materialize/issues/13887
- [ ] Test interactive query performance
- [ ] Define test cases and objectives: https://www.notion.so/materialize/a489a8c3374e46cda81f1e2892ead59a?v=ee8c363052ba43db8c1e75d85dda32d2&p=f99a5ab89ca44555a5734431de385081&pm=s
- [ ] Perform measurements and analyses of query times
- [ ] Create playbook document for approaching related performance improvement requests
- [ ] Create issues for further work on performance improvements.
- [ ] Identify and implement performance improvements still in scope of epic
Testing
Bugs
Related
The following issues and PRs were motivated by discussion and findings within the context of this epic, but are not part of this epic's scope:
- https://github.com/MaterializeInc/materialize/issues/13489
- https://github.com/MaterializeInc/materialize/pull/14028
Questions
- What are @frankmcsherry's insights on speeding up interactive queries?
Imo we should:
- Make sure we can measure interactive query times. We have introspection views for peek latencies, and tbh these haven't usually been very high. They are often potentially deceptively small, in fact. So, I would first make sure we can track the end-to-end latency for compute providing answers to peeks.
- Some amount of end-to-end latency is data availability. For reasons of consistency, ADAPTER may pose queries at times not yet available in the data. We needn't beat ourselves up over these; they cannot go fast with the parameters chosen. Ideally, we could break down the end-to-end latencies and report when data were first available.
Having done those, I think what @uce might be getting at is that we currently deploy SELECT ...
queries as standing dataflows whose answers we peek at once they are ready. The could perhaps be more efficiently deployed as dataflows whose inputs we terminate immediately (once all relevant data have passed through), avoiding any subsequent maintenance work that won't contribute to the SELECT
result. This also has the potential to simplify some of the operators (joins are able to discard their input arrangements if the opposing input has run dry; this can dramatically improve running times, or harm them if the dropping work is done instead of just producing the output; we'll have to see).
As discussed in a meeting, one thing we might do is introduce "source frontiers" like we have "materialization frontiers", reporting for each source the frontier through which we have received data. This would allow COMPUTE to more specifically report on whether it is in a position to respond to an "interactive query" and from which moment might the latency be attributed to COMPUTE rather than ADAPTER (timestamp too far in the future) or STORAGE (data not yet durable).
Related tickets from times past:
- #6738
- #3740
Tickets about loss of interactivity due to issues in the dataflow layer. I guess those also count towards improved interactive query times:
- #8853
- #7577
- #7575