parquet-format
parquet-format copied to clipboard
Define timestamp ordering for int96 timestamp columns
Describe the enhancement requested
int96 is used as the default timestamp in spark. Timestamp filters are extremely common and lack of ordering and stats limit optimization opportunities See also: https://github.com/apache/parquet-java/issues/3242 https://github.com/apache/arrow-rs/issues/7686 https://lists.apache.org/thread/6fm50b3pmh6mz659jb5wx5vzmvwccz1n
Here is a PR that attempts to clarify the current status (does not attempt to actually define the ordering):
- https://github.com/apache/parquet-format/pull/504
Here is a document with background clarification as well:
- https://docs.google.com/document/d/1Ox0qHYBgs_3-pNqn9V8zVQm_W6qP0lsbd2XwQnQVz1Y/edit?tab=t.0#heading=h.nx463wi3cktx