Weston Pace
Weston Pace
Compared to the regular large binary encoding this will have considerably less metadata (since there will be fewer pages). In the future it will also be possible to read just...
This happens while scanning a bunch of data in python. The error (an S3 error in this case) is immediately propagated to python and the reader is discarded. However, something...
Now that 2.0 is the default we should avoid making changes, even non-breaking feature changes, to make sure we work out all the kinks. There are still a number of...
Currently the v2 scheduler and decoder rely on calculating a priority for each I/O request. It is important that both the scheduler and the decoder agree on this priority. This...
We added a packed struct encoding in #2601 but it does not support variable-width columns. We should extend that encoding or add a new encoding which is capable of packing...
The zipped structural encoding is similar to the mini-block structural encoding in that we must first calculate the repetition and definition levels. However, the zipped structural encoding should be used...
Pushdown filtering was prototyped but never fully implemented. One of the challenges is that pushdown filtering requires an "initialization" phase which could be combined as part of opening the file...
The mini block structural encoding is useful for narrow data types and is capable of handling opaque compressive encodings (such as delta compression). The basic idea is that we store...
Currently the FSST encoding is only applied when the data is larger than 4MiB. This is an artificial limitation because I was seeing bugs when encoding smaller strings. In addition,...
Integer compression is very important. It's useful for integral types, decimal types, and temporal types. It can also benefit string / binary types as well as they store their offsets...