opteryx
opteryx copied to clipboard
🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
### Thank you for taking the time to report a problem with Opteryx. _To help us respond to your request, please provide the following details about the bug._ --- **Describe...
use bit masks to return from main ops (eq/GT/etc) and the combine (and/or/xor) in cython before returning back a mask to the python code
numpy was great as a way to get a lot of good performance code quickly. It allowed us to write an okay performance engine in Python relatively quickly. But we're...
CAST calls to be rewritten from functions (eg INTEGER(Val) ) to Ops (e.g val CAST_TO INTEGER) Ops are faster because they skip the arrow to numpy conversation and where possible...
If we have stats for all of the files being read, we can respond to: COUNT(*) MIN(col) MAX(col) from just the stats
Rewriting chains of OR conditions into single operator is a good way to improve query performance. When users write `ARRAY_CONTAINS(x, 1) OR ARRAY_CONTAINS(x, 2)` we can rewrite this to `x...
[Roaring Bitmaps](https://github.com/RoaringBitmap/RoaringBitmap) are meant to be faster than sets, but only 32bits. We should try and see how much faster they are for FILTER joins and DISTINCT.
~~~sql SELECT SUM(id > 4 OR id = 1) FROM $planets ~~~ > KeyError: 'Field "13f8019749b7c586" does not exist in schema'
I think we currently only create new filters when we have only a single blob to read (e.g. on an inner join when we know the join only matches values...
### Thank you for taking the time to report a problem with Opteryx. _To help us respond to your request, please provide the following details about the bug._ --- **Describe...