opteryx issues

✨ [CLICKBENCH] Fix ClickBench failures

1

as of 0.19.1a956 ~~~sql /*28*/ SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer '' GROUP BY REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') HAVING COUNT(*)...

joocer

✨ Write Tests

- [ ] Row estimates - [ ] Correlated filters - [ ] bloom filter disabled - [ ] bloom filter on

joocer

High Priority 1️⃣

✨ Reader Statistics

The reader statistics should be updated calls = number of files/blobs/chunks read records_in = number of prefiltered records

joocer

✨ data distribution estimation

Yes, you can use min/max values, row counts, and metadata from Parquet row groups and Iceberg file-level statistics to estimate the distribution of a column without reading the full dataset....

joocer

✨ filter selectivity estimation

~~~sql import numpy as np import math def estimate_selectivity( lower: float, upper: float, total_records: int, cardinality: int, value: float, filter_type: str ) -> float: """ Estimates filter selectivity given column...

joocer

✨ experiment with splitting morsel into blocks and processing the blocks in parallel

May be a way to get filters in parallel without full blown parallel engine

joocer

✨ string equality experiment

Here’s an optimized Cython implementation using memcmp for byte-wise comparison instead of Python slicing. This should be even faster because it avoids unnecessary slicing operations and compares raw memory directly....

joocer

✨ [CLICKBENCH] Very Slow Query

1

~~~sql SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10; ~~~ This query performs a lot slower than other engines. I'm not sure how they would...

joocer

✨ internal morsel size 8192?

1

initial testing, 8k was too small and created too many calls, try larger numbers

joocer

✨ combine predicates

combine adjacent filter steps into single steps.

joocer

opteryx
opteryx copied to clipboard

Metadata

✨ [CLICKBENCH] Fix ClickBench failures

✨ Write Tests

✨ Reader Statistics

✨ data distribution estimation

✨ filter selectivity estimation

✨ experiment with splitting morsel into blocks and processing the blocks in parallel

✨ string equality experiment

✨ [CLICKBENCH] Very Slow Query

✨ internal morsel size 8192?

✨ combine predicates

← Metadata

Owner

Metadata

opteryx opteryx copied to clipboard

Metadata

← Metadata

Owner

Metadata

opteryx
opteryx copied to clipboard