iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Allow Specifying Partitioning Function for External Mappings

Open omervk opened this issue 6 years ago • 1 comments

(this is dependent upon the completion of #71 and #72)

The partition function for external mappings is derived from the parsing of the path of data files a-la Hive's format.

For instance the structure:

/date=2018-11-12/file.avsc
/date=2018-11-13/file.avsc

Would create a new column date with with string values 2018-11-12 and 2018-11-13 and assume the partitioning function is identity(date) instead of being able to derive it from another field (i.e. a function of the date part of a timestamp column).

Iceberg should let users specify their own partitioning function, based on existing columns.

omervk avatar Nov 13 '18 22:11 omervk

I think what you're trying to accomplish would be done a little differently. I understand the term "partitioning function" to mean the partition transformations that are part of a partition spec.

That's not the right place to do this because we don't need to add extra representations of a date to the manifest files. Instead, a process importing files from an external source should parse the strings and produce the right data value (day ordinal from 1970-01-01=0) for the date. Then Iceberg would use the same partition code for these files.

rdblue avatar Nov 16 '18 19:11 rdblue