Florian Jetter issues

Results 101 issues of


                                            Florian Jetter

New format specification for single table datasets

The current format specification was built with a multi table dataset in mind and caries a lot of redundancies. This issue should collect requirements for a new `metadata_version=5` specification. xref...

Test performance of io.cube.test_query module

The performance of the `io.cube.test_query` module is concerning and takes up the majority of runtime of the entire test suite. While this is a cornerstone of the cube functionality, the...

Serialisable partitioning spec

### Problem description The physical layout and indexing of the dataset dominantly impacts read performances. Often dataset are designed in such a way to support a rather specific use case...

Use memory conserving arrow options

Arrow introduces two options which are supposedly helping with memory conservation. `self_destruct` frees a converted column as soon as it is converted which renders the pa.Table object useless after the...

New index storage data structure

### Problem description The initial design for the indices where based on a version of this library where only single value, equality queries could be performed, see [here](https://github.com/JDASoftwareGroup/kartothek/blob/61ce401512e3a46969f1db56e2d2eec2f0c5b334/kartothek/core/dataset.py#L286). This motivated...

enhancement

performance

Allow arrow tables in read and write interface

We're using Apache Arrow as the ultimate tool to glue everything together. When writing data, we accept pandas dataframes, convert them to arrow tables and store them as parquet. When...

enhancement

Florian Jetter

New format specification for single table datasets

Test performance of io.cube.test_query module

Serialisable partitioning spec

Use memory conserving arrow options

New index storage data structure

Index build pipeline may build indices with incompatible types

Out of bounds datetime exception in index tests

Allow delete logic to be specified using predicate syntax

Dispatch logic relies on index to array conversion

Allow arrow tables in read and write interface