opteryx
opteryx copied to clipboard
🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
To help with understanding query execution,, provide the ability to draw an executed query plan, ChatGPT has suggested this code will help draw the structure ~~~python from collections import defaultdict...
Split plans into pipelines (series of stateless ops) Parallelize these pipelines Merge at pipeline breakers, potentially setting off a new series of pipelines This would be another planning step, after...
after #2239, use cost model and selectivity estimates to order predicates
When we have multiple predicates on a single column (including ones we've added, or that are implied from statistics), we may be able to reduce the number of predicates we...
semi joins probably will benefit a lot from a prefilter. Left/right outer joins may also benefit
we have metrics and estimates for most table types now, we should be able to use these to make decisions.
The initial implementation only triggers if there's one VARCHAR/BLOB join condition. We should execute for any number of conditions with one or more of them VARCHAR/BLOB
mypy errors have been around for a while and haven't been fixed
From buzzhouse article Run the same query with different settings, such as the number of threads, enable/disable external sorting or grouping, or set a different join algorithm. https://clickhouse.com/blog/buzzhouse-bridging-the-database-fuzzing-gap-for-testing-clickhouse
warning: opteryx/compiled/structures/node.pyx:119:22: Strings should no longer be used for type declarations. Use 'cython.int' etc. directly. performance hint: opteryx/compiled/functions/murmurhash3_32.pxd:6:28: No exception value declared for 'cy_murmurhash3' in pxd file. Users cimporting this...