delta icon indicating copy to clipboard operation
delta copied to clipboard

[PROTOCOL RFC] Add `TIME` & `INTERVAL` as supported dtypes

Open ion-elgreco opened this issue 10 months ago • 6 comments

I am often getting the question, why doesn't delta-rs support time/interval dtypes.. Seems obvious to add since it's supported by Parquet and supported by most engines.

We can do the same thing as timestampNtz and gate these behind a reader & writer feature. I am proposing we add these primitive types:

Type Name Description
time Microsecond precision time of day (backed by parquet TimeType) https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time
interval A fixed amount of time https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval

Time

This feature introduces a new data type to support time. For example: 00:00:00, or 24:00:00. The serialization method would be: {hour}:{minute}:{second}

Interval

This feature introduces a new data type to support day-time intervals. For example: 5d 10h 5m 10s. The serialization method would be: {day}d:{hour}h:{minute}m:{second}s

To support these feature:

To have a column of Time/Interval type in a table, the table must have Reader Version 3 and Writer Version 7. A feature name time/interval must exist in the table's readerFeatures and writerFeatures.

Engine support:

Engine Type Name Support equivalent engine type
Spark time No N/A
Spark interval Yes DayTimeIntervalType
Trino time Yes TIME(P)
Trino interval Yes INTERVAL DAY TO SECOND
Flink time Yes TIME(P)
Flink interval Yes INTERVAL DAY TO SECOND
Datafusion time Yes Time (in arrow: Time64)
Datafusion interval Yes Interval(DayTime)

ion-elgreco avatar Mar 30 '24 00:03 ion-elgreco

Supersedes this issue: https://github.com/delta-io/delta/issues/2319

@tdas @ryan-johnson-databricks and @bart-samwel

ion-elgreco avatar Mar 30 '24 00:03 ion-elgreco

This makes sense to me if it's just like TimestampNTZ. It's unfortunate that Spark doesn't have a TIME type (yet). :/

bart-samwel avatar Apr 08 '24 12:04 bart-samwel

@bart-samwel what would be the next steps to make this happen?

ion-elgreco avatar Apr 10 '24 20:04 ion-elgreco

Seems like we need to add the datatype support to spark, as a starting point? (seems quite doable, but I'm not the expert there)

scovich avatar Apr 19 '24 22:04 scovich

Seems like we need to add the datatype support to spark, as a starting point? (seems quite doable, but I'm not the expert there)

Can we maybe start with interval first then, since it's supported in all engines?

I can add the support in delta-kernel-rs and Delta-RS for both types

ion-elgreco avatar May 24 '24 06:05 ion-elgreco

@scovich @tdas @ryan-johnson-databricks and @bart-samwel

Can we move forward with this adding it to the protocol?

@scovich I don't see why we should at this to Spark first

ion-elgreco avatar Aug 04 '24 22:08 ion-elgreco