quickwit
quickwit copied to clipboard
Follow search performance on basic queries
We could use a restricted set of queries used in tantivy benchmark.
Would be also interesting to know the perf with parallel queries.
Let's split this into subtasks:
- [x] Load test-data
- [x] Create indexes via CLI - https://github.com/PSeitz/qw_build_index
- [x] Ingest via CLI
- [ ] Store and retrieve datasets from S3
- [ ] Run set of queries
- [ ] Have different queries defined in a yaml or toml
- [ ] Start server via CLI (multiple configs, matrix?, memory layout randomization)
- [ ] Warmup caches
- [ ] Run queries
- [ ] Retrieve and store benchmarks in a DB (could be Quickwit :)
- [ ] Have a dedicated machine running continuously
By the way, preparing a good benchmark set will help a lot with using PGO for Quickwit. Without some "generic" sample load it is much more time consuming to prepare PGO-optimized binary.
Interesting to see how databend is doing it: https://github.com/datafuselabs/databend/issues/3084
Regarding PGO (and Bolt) possibly these links could be helpful:
- ScyllaDB results: https://github.com/scylladb/scylladb/pull/10808
- Vector results: https://github.com/vectordotdev/vector/issues/15631
- Rust experience with LTO + PGO + BOLT: https://kobzol.github.io/rust/rustc/2022/10/27/speeding-rustc-without-changing-its-code.html