vortex icon indicating copy to clipboard operation
vortex copied to clipboard

Add support for dictionaries shared across multiple columns

Open aditanase opened this issue 1 month ago • 1 comments

One interesting feature from the F3 paper is shared dictionaries, across a combination of columns. The extreme version of this would be a single dictionary referenced by all the columns. There is some related work in the C3 repo as well: https://github.com/cwida/C3

I am assuming this is something that the vortex layout could accomodate. Any pointers on how to approach this with the current extensibility layers?

aditanase avatar Nov 08 '25 08:11 aditanase

Hi @aditanase !

Thanks for creating this issue! Issue #2657 tracks (a portion of) our string wishlist.

In the single column case, what you've described above is implemented as the DictLayout (see this folder in vortex-layout). The dictionary layout has two child layouts: values and codes. The values is the dictionary and the codes are indices therein. The codes can be (and, indeed, in the default btrblocks-style compressor are) stored as a ChunkedLayout which permits either streamed or partitioned reading of the codes separately from the values.

The extreme version of this would be a single dictionary referenced by all the columns.

Yeah, this would be very cool! We are not currently working on that; though we're aware of the F3 paper [1]. The Vortex community is eager to welcome new open source contributors! I think the best way to get started is to propose a design. There's also now a Slack community you can join here.

The DictLayout is probably the best place to start. A MultiColumnDictLayout should look similar. Maybe it's exactly a DictLayout where the codes are required to be a StructLayout? That might require some kind of MergeLayout to stitch together the non-multi-column-dict columns with the multi-column-dict columns.

[1] For anyone else stumbling on this issue, the paper is: Zeng, et al., "F3: The Open-Source Data File Format for the Future" https://db.cs.cmu.edu/papers/2025/zeng-sigmod2025.pdf .

danking avatar Nov 12 '25 20:11 danking