vortex icon indicating copy to clipboard operation
vortex copied to clipboard

Sorting capabilities on arrays

Open AdamGS opened this issue 3 months ago • 1 comments

Sorting data is obviously a common use case, but these are a few features that I think would be useful:

  1. Sorting a single array, for some types it could even happen without moving data
  2. Lexical sort using multiple columns
  3. Sorting to indices which gives us an array to use with take, and which passing around or holding in memory is much cheaper.
  4. Sorting with a limit - only returning limit items
  5. Sorting can be in both ascending and descending orders.

References

AdamGS avatar Sep 25 '25 10:09 AdamGS

I assume Vortex can do sorting of some of its encoded/compressed data faster than the sort once it's Arrow data? If so you may be interested in this change: https://github.com/apache/datafusion/pull/17337. This will eventually allow a table scan to say "I can provide sorted data" or at least for the scan to produce sorted data (so that a smart sort algorithm upstream can use insertion sort or something) which practically gives a format like Vortex the ability to do it's own sorting.

adriangb avatar Oct 15 '25 04:10 adriangb