opteryx
opteryx copied to clipboard
🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
As large files are being processed, they cause slowdowns this may be because large amounts of memory are being allocated and deallocated at each step. We should split large files...
- [ ] aggregate node - [ ] group and aggregate node - [ ] cross join node
~~~sql SELECT COUNT(*), EXTRACT(HOUR FROM dte) FROM table FOR date GROUP BY EXTRACT(HOUR FROM dte) ~~~ The from in the EXTRACT statement confuses the temporal filters
🪲 SELECT DISTINCT ... ORDER BY - doesn't correctly DISTINCT if ORDER BY columns aren't in the SELECT
### Thank you for taking the time to report a problem with Opteryx. _To help us to respond to your request we ask that you try to provide the below...
- execution time - rows in - rows out - start time This will allow us to create a better representation of the execution for debugging performance.
This will allow us to test the optimizations are not regressed... because they are functionally transparent it's hard to spot when they aren't being applied, we can address this by...
We currently materialize items we read from the buffer pool to byte arrays so that if the item is removed or moved which in use, we're working on a copy...
### Thanks for stopping by to let us know something could be better! **Is your feature request related to a problem? Please describe.** _A clear and concise description of what...
Regardless of the format that the file is in, when serializing for the buffer pool it should be saved as a parquet file (unless we can make another, faster format),...
IO in a separate process should improve through-put, although this is increasingly not the bottle neck and processing is, this is a first step in being able to multiprocess.