parquet-format icon indicating copy to clipboard operation
parquet-format copied to clipboard

Define timestamp ordering for int96 timestamp columns

Open rahulketch opened this issue 5 months ago • 2 comments

Describe the enhancement requested

int96 is used as the default timestamp in spark. Timestamp filters are extremely common and lack of ordering and stats limit optimization opportunities See also: https://github.com/apache/parquet-java/issues/3242 https://github.com/apache/arrow-rs/issues/7686 https://lists.apache.org/thread/6fm50b3pmh6mz659jb5wx5vzmvwccz1n

rahulketch avatar Jun 25 '25 10:06 rahulketch

Here is a PR that attempts to clarify the current status (does not attempt to actually define the ordering):

  • https://github.com/apache/parquet-format/pull/504

alamb avatar Jun 25 '25 18:06 alamb

Here is a document with background clarification as well:

  • https://docs.google.com/document/d/1Ox0qHYBgs_3-pNqn9V8zVQm_W6qP0lsbd2XwQnQVz1Y/edit?tab=t.0#heading=h.nx463wi3cktx

alamb avatar Jun 25 '25 18:06 alamb