Michael Jungmair issues

Results 20 issues of


                                            Michael Jungmair

make standard config run in docker

Handle `in` expressions with a large right side efficiently

Currently, in-clauses are executed naively in the case of a list with scalar values. Even if there are hundreds of values, we generate hundreds of comparisons. This not only hurts...

Treatment of chars in LingoDB is not ideal

Currently, char types (fixed-width strings), are treated like this in LingoDB: For up to a length of 8 bytes/chars, integers of appropriate width are used to represent chars below the...

Generate MLIR documentation

It would be great, if we could have a documentation for LingoDB's MLIR dialects similar to the one of [MLIR](https://mlir.llvm.org/docs/Dialects/), and could automate the process split into two parts. 1....

github-infrastructure

Query Optimization: Pass Pipeline needs refactoring

Problems: - Too many passes/iteration over the IR, which increase optimization time - Some of the passes are executed multiple times, but are not idempotent. This leads to problems further...

Query Optimization: also consider distinct values for estimating (join) selectivities

For joins, we currently do not take the number of distinct values into account. Especially for categorical data stored e.g. in strings, our estimates are completely off. Also: we could...

query-optimization

try using mimalloc allocator

DATE_TRUNC returns epoch nanoseconds instead of timestamp/date type

Current implementation returns int64_t (epoch nanoseconds) instead of preserving the input timestamp type like PostgreSQL. The function should (probably) return the same timestamp type as its input argument.

compilation

SQL Frontend: reject erroneous queries

At the moment, erroneous queries are sometimes not rejected by the frontend and usually fail later in the compilation. Example: `select l_shipmode, count(*) from lineitem` This should be fixed. Additionally,...

sql-frontend

Improve LingoDBTable/HashIndex implementation

**1. directly compute hashes for column values in runtime** Currently hashes are calculated using a embedded SQL query: https://github.com/lingo-db/lingo-db/blob/aa3a3610c503aa8deb6ae88646448474f9f9683b/src/runtime/LingoDBHashIndex.cpp#L54 This introduces quite some overhead... **2. don't use arrow function to...

runtime