opteryx issues

✨ Emit large files in blocks from readers

As large files are being processed, they cause slowdowns this may be because large amounts of memory are being allocated and deallocated at each step. We should split large files...

joocer

✨ Operator Nodes shouldn't be planning in the init function

- [ ] aggregate node - [ ] group and aggregate node - [ ] cross join node

joocer

🪲 Can't use EXTRACT statement with a FOR clause

1

~~~sql SELECT COUNT(*), EXTRACT(HOUR FROM dte) FROM table FOR date GROUP BY EXTRACT(HOUR FROM dte) ~~~ The from in the EXTRACT statement confuses the temporal filters

joocer

Bug 🪲

🪲 SELECT DISTINCT ... ORDER BY - doesn't correctly DISTINCT if ORDER BY columns aren't in the SELECT

### Thank you for taking the time to report a problem with Opteryx. _To help us to respond to your request we ask that you try to provide the below...

joocer

✨ Each step in the physical plan should report statistics

- execution time - rows in - rows out - start time This will allow us to create a better representation of the execution for debugging performance.

joocer

✨ Stats should include when optimizations have been made

This will allow us to test the optimizations are not regressed... because they are functionally transparent it's hard to spot when they aren't being applied, we can address this by...

joocer

✨ add latches (locks) to items in the bufferpool to make them read only during use

We currently materialize items we read from the buffer pool to byte arrays so that if the item is removed or moved which in use, we're working on a copy...

joocer

✨ support splunk

### Thanks for stopping by to let us know something could be better! **Is your feature request related to a problem? Please describe.** _A clear and concise description of what...

joocer

✨ Buffer Pool should save in PARQUET to reduce read overhead

Regardless of the format that the file is in, when serializing for the buffer pool it should be saved as a parquet file (unless we can make another, faster format),...

joocer

✨ Run IO in a separate process

IO in a separate process should improve through-put, although this is increasingly not the bottle neck and processing is, this is a first step in being able to multiprocess.

joocer

opteryx
opteryx copied to clipboard

Metadata

✨ Emit large files in blocks from readers

✨ Operator Nodes shouldn't be planning in the init function

🪲 Can't use EXTRACT statement with a FOR clause

🪲 SELECT DISTINCT ... ORDER BY - doesn't correctly DISTINCT if ORDER BY columns aren't in the SELECT

✨ Each step in the physical plan should report statistics

✨ Stats should include when optimizations have been made

✨ add latches (locks) to items in the bufferpool to make them read only during use

✨ support splunk

✨ Buffer Pool should save in PARQUET to reduce read overhead

✨ Run IO in a separate process

← Metadata

Owner

Metadata

opteryx opteryx copied to clipboard

Metadata

← Metadata

Owner

Metadata

opteryx
opteryx copied to clipboard