otel-arrow icon indicating copy to clipboard operation
otel-arrow copied to clipboard

Declare sorted columns in Arrow Schema to enable further optimizations

Open lquerel opened this issue 1 year ago • 0 comments

At present, the sorted columns list for each Arrow record type is hardcoded. However, by designating this list as metadata within the Arrow Schema for each record, we pave the way for advanced optimizations.

For example, the default list of sorted columns may not always be ideal for optimizing compression ratios for specific tasks. By allowing for a dynamic column order based on entropy, we can potentially achieve improved compression. Integrating this list into the schema equips us with the information necessary to develop an adaptive receiver, ensuring accurate decoding of Arrow records.

lquerel avatar Aug 22 '23 21:08 lquerel