iceberg
iceberg copied to clipboard
Allow Specifying Partitioning Function for External Mappings
(this is dependent upon the completion of #71 and #72)
The partition function for external mappings is derived from the parsing of the path of data files a-la Hive's format.
For instance the structure:
/date=2018-11-12/file.avsc
/date=2018-11-13/file.avsc
Would create a new column date
with with string values 2018-11-12
and 2018-11-13
and assume the partitioning function is identity(date)
instead of being able to derive it from another field (i.e. a function of the date part of a timestamp
column).
Iceberg should let users specify their own partitioning function, based on existing columns.
I think what you're trying to accomplish would be done a little differently. I understand the term "partitioning function" to mean the partition transformations that are part of a partition spec.
That's not the right place to do this because we don't need to add extra representations of a date to the manifest files. Instead, a process importing files from an external source should parse the strings and produce the right data value (day ordinal from 1970-01-01=0) for the date. Then Iceberg would use the same partition code for these files.