awesome-olap-paper
awesome-olap-paper copied to clipboard
Paper related to OLAP techniques
Awesome-OLAP-Paper 
A curated paper list of awesome Online Analytical Processing databases, frameworks, ressources, tools and other awesomeness, for data engineers.
Welcome new PR, please conform to the committed rules: paperName(with link) [MeetingName Year]
If the paper has the open-source code, please supply its github links in Meeting.
-
Awesome-OLAP-Paper
-
Query-Aware Database Generation
- Survey
- Query Schedule
-
Query Optimization
- Query Rewrite
-
Cardinality Estimation
- Histogram
- Sampling
- Others
- Survey
- Join Order
- Join Algorithms
- Cost Model
- View
- Survey
- Index
- Query Execution
- Data Dependency Search
- Query Compilation
- Logic Bugs Detection
-
Storage
- LSM-Tree
- Proxy
- Data Loading
- Database Kernel
-
Others
- MVCC
-
HTAP
-
System Architecture
- Linear Consistency
- Sequential Consistency
- Session Consistency
- Survey
- Kernel Optimization
- Result Replay
-
System Architecture
- Benchmark
- Time Series
- Vector Data
- OLTP
- AI4DB
- Industry
-
Query-Aware Database Generation
Query-Aware Database Generation
- QAGen: Generating Query-Aware Test Databases [SIGMOD 07]
- Generating Targeted Queries for Database Testing [SIGMOD 08]
- Generating Databases for Query Workloads [VLDB 10]
- Data Generation using Declarative Constraints [SIGMOD 11]
- MyBenchmark: generating databases for query workloads [VLDB 14]
- Scalable and Dynamic Regeneration of Big Data Volumes [EDBT 18]
- Touchstone: Generating Enormous Query-Aware Test Databases [OSDI 18]
- Synthesizing Linked Data Under Cardinality and Integrity Constraints [SIGMOD 21]
- Projection-Compliant Database Generation [VLDB 22]
- SAM: Database Generation from Query Workloads with Supervised Autoregressive Models [SIGMOD 22]
- PrivLava: Synthesizing Relational Data with Foreign Keys under Differential Privacy [SIGMOD 23]
- Mirage: Generating Enormous Databases for Complex Workloads [ICDE 24]
Survey
Query Schedule
Query Optimization
- Sampling-Based Query Re-Optimization [SIGMOD 16]
- Kepler: Robust Learning for Parametric Query Optimization [SIGMOD 23]
- Rethink Query Optimization in HTAP Databases [SIGMOD 24]
- Optimizing Nested Recursive Queries [SIGMOD 24]
Query Rewrite
- QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting [VLDB 23]
- SlabCity: Whole-Query Optimization using Program Synthesis [VLDB 23]
- GEqO: ML-Accelerated Semantic Equivalence Detection [SIGMOD 24]
- Proving Query Equivalence Using Linear Integer Arithmetic [SIGMOD 24]
Cardinality Estimation
Histogram
- Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries [None 87]
- Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results [ACM Transactions on Database Systems 93]
- Independence is good: Dependency-based histogram synopses for high-dimensional data [SIGMOD 01]
- STHoles: a multidimensional workload-aware histogram [SIGMOD 01]
- A multi-dimensional histogram for selectivity estimation and fast approximate query answering [CASCON 03]
- The history of histograms (abridged) [VLDB 03]
- ISOMER: Consistent histogram construction using query feedback [ICDE 06]
- Join Over Histograms [Alberto Dell'Era 07]
- Improving accuracy and robustness of self-tuning histograms by subspace clustering [ICDE 16]
- LHist: Towards Learning Multidimensional Histogram for Massive Spatial Data [ICDE 21]
Sampling
- Two-Level Sampling for Join Size Estimation [SIGMOD 17]
- Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing [SIGMOD 21]
Others
- Access path selection in a relational database management system [SIGMOD 79]
- Approximating multi-dimensional aggregate range queries over real attributes [SIGMOD 00]
- Selectivity estimators for multidimensional range queries over real attributes [VLDB 05]
- Plan Bouquets: Query Processing without Selectivity Estimation [SIGMOD 14]
- Exact Cardinality Query Optimization with Bounded Execution Cost [SIGMOD 19]
- JoinSketch: A Sketch Algorithm for Accurate and Unbiased Inner-Product Estimation [SIGMOD 23]
- Efficient and Effective Cardinality Estimation for Skyline Family [SIGMOD 23]
Survey
- Preventing bad plans by bounding the impact of cardinality estimation errors [VLDB 09]
- Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsof SQL Server [VLDB 23]
- Sub-optimal Join Order Identification with L1-error [SIGMOD 24]
Join Order
- Join Order Selection with Deep Reinforcement Learning: Fundamentals, Techniques, and Challenges [VLDB 23]
- Efficiently Computing Join Orders with Heuristic Search [SIGMOD 23]
- Ready to Leap (by Co-Design)? Join Order Optimisation on Quantum Hardware [SIGMOD 23]
- Quantum-Inspired Digital Annealing for Join Ordering [VLDB 24]
- POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance [VLDB 24]
Join Algorithms
- Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems [VLDB 12]
- Leapfrog Triejoin: a worst-case optimal join algorithm [International Conference on Database Theory 12]
- An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory [SIGMOD 16]
- Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems [SIGMOD 18]
- Adopting Worst-Case Optimal Joins in Relational Database Systems [VLDB 20]
- Free Join: Unifying Worst-Cast Optimal and Traditional Joins [arXiv 23]
- Reservoir Sampling over Joins [SIGMOD 24]
Cost Model
- LEO – DB2’s LEarning Optimizer [VLDB 11]
- Predicting query execution time: are optimizer cost models really unusable? [ICDE 13]
- Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads [VLDB 13]
- Forecasting the cost of processing multi-join queries via hashing for main-memory databases [SoCC 15]
- Query Performance Prediction for Concurrent Queries using Graph Embedding [VLDB 20]
- Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload [arXiv 21]
- Rethinking Learned Cost Models: Why Start from Scratch? [SIGMOD 24]
- Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools [SIGMOD 24]
View
Survey
- How Good Are Query Optimizers, Really? [VLDB 15]
- Cardinality Estimation: An Experimental Survey [VLDB 17]
- A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration [VLDB 21]
- Have query optimizers hit the wall? [VLDB Journal 22]
- Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation [VLDB 22]
- Data dependencies for query optimization: a survey [VLDB Journal 22]
- Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis [VLDB 23]
Index
- SQL Server Column Store Indexes [SIGMOD 11]
- Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation [SIGMOD 18]
Query Execution
- MonetDB/X100: Hyper-Pipelining Query Execution [CIDR 05]
- Materialization Strategies in the Vertica Analytic Database: Lessons Learned [ICDE 13]
- Rethinking SIMD Vectorization for In-Memory Databases [SIGMOD 15]
- Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? [SIGMOD 17]
- Building Advanced SQL Analytics From Low-Level Plan Operators [SIGMOD 21]
- ChainedFilter: Combining Membership Filters by Chain Rule [SIGMOD 24]
Data Dependency Search
Query Compilation
- How to Architect a Query Compiler [SIGMOD 16]
- Adaptive Execution of Compiled Queries [ICDE 18]
Logic Bugs Detection
- Detecting Logic Bugs of Join Optimizations in DBMS [SIGMOD 23 Best Paper]
Storage
- What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines [VLDB 23]
- An Empirical Evaluation of Columnar Storage Formats [VLDB 24]
LSM-Tree
- Dissecting, Designing, and Optimizing LSM-based Data Stores [SIGMOD 22 Tutorial]
Proxy
Data Loading
Database Kernel
- Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics [CIDR 21]
- Disaggregated Database Systems [VLDB 23 Tutorial]
- GPU Database Systems Characterization and Optimization [VLDB 24]
- The Art of Latency Hiding in Modern Database Engines [VLDB 24]
- DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay [SIGMOD 24]
Others
MVCC
- Scalable Garbage Collection for In-Memory MVCC Systems [VLDB 13]
- Rethinking serializable multiversion concurrency control [VLDB 15]
- An Empirical Evaluation of In-Memory Multi-Version Concurrency Control [VLDB 17]
- Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting [SIGMOD 18]
- Long-lived Transactions Made Less Harmful [SIGMOD 20]
- Rethink the Scan in MVCC Databases [SIGMOD 21]
- Diva: Making MVCC Systems HTAP-Friendly [SIGMOD 22]
- Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems [VLDB 22]
- Scalable and Robust Snapshot Isolation for High-Performance Storage Engines [VLDB 23]
- One-shot Garbage Collection for In-memory OLTP through Temporality-aware Version Storage [SIGMOD 23]
HTAP
System Architecture
Linear Consistency
- HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots [ICDE 12]
- TiDB: A raft-based htap database [VLDB 20]
- OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster [VLDB 23]
Sequential Consistency
- BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications [SIGMOD 17]
- F1 Lightning: HTAP as a Service [VLDB 20]
- Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing [ATC 21]
- ByteHTAP: ByteDance’s HTAP System with High Data Freshness and Strong Data Consistency [VLDB 22]
Session Consistency
- PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database [VLDB 18]
- PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba [SIGMOD 23]
Survey
- HTAP Databases: What is New and What is Next [SIGMOD 22]
- Data Sharing Model and Optimization Strategies in HTAP Database Systems [Journal of Software 23]
- HTAP Databases: A Survey [TKDE 24]
Kernel Optimization
- Log Replaying for Real-Time HTAP: An Adaptive Epoch-based Two-Stage Framework [ICDE 24]
Result Replay
Benchmark
- How Good is My HTAP System? [SIGMOD 22]
- OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems [ICDE 22]
- Dike: A Benchmark Suite for Distributed Transactional Databases [SIGMOD 23]
- M2Bench: A Database Benchmark for Multi-Model Analytic Workloads [VLDB 23]
- Cloud Analytics Benchmark [VLDB 23]
- Pollock: A Data Loading Benchmark [VLDB 23]
- VeriBench: Analyzing the Performance of Database Systems with Verifiability [VLDB 23]
- TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications [VLDB 23]
- CDSBen: Benchmarking the Performance of Storage Services in Cloud-native Database System at ByteDance [VLDB 23]
- FEBench: A Benchmark for Real-Time Relational Data Feature Extraction [VLDB 23]
- TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems [VLDB 23]
- ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems [VLDB 23]
- DBPA: A Benchmark for Transactional Database Performance Anomalies [SIGMOD 23]
- HyBench: A New Benchmark for HTAP Databases [VLDB 24]