datafusion
datafusion copied to clipboard
Make from_unixtime aware of execution timezone
Is your feature request related to a problem or challenge?
No response
Describe the solution you'd like
Applies an offset to the unixtime after the cast.
Describe alternatives you've considered
No response
Additional context
Datafusion's from_unixtime is not aware of timezone. The following code block demonstrates the interaction between me and datafusion-cli.
> set datafusion.execution.time_zone = '+08:00';
0 row(s) fetched.
Elapsed 0.000 seconds.
> select to_unixtime('2024-09-01 10:00:00+08:00');
+------------------------------------------------+
| to_unixtime(Utf8("2024-09-01 10:00:00+08:00")) |
+------------------------------------------------+
| 1725156000 |
+------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
> select from_unixtime(1725156000);
+----------------------------------+
| from_unixtime(Int64(1725156000)) |
+----------------------------------+
| 2024-09-01T02:00:00 |
+----------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
Specifically, when converting a date to unixtime with the to_unixtime function, we can provide a timezone and the cast result could reveal the timezone. However, when converting the unixtime back to a date, Datafusion is not aware of timezone. Yes, the cast result is correct, but the timezone info is discarded and only a date with timezone UTC+0 is returned.
I wonder if we can make the from_unixtime function aware of timezone. For example, by setting the execution timezone through the datafusion.execution.time_zone configuration, we can apply an offset to the cast result so the timezone info could be revealed.
UDF's currently do not have access to the DF context which would make implementing this using configuration a bit difficult. What could be easily supported is allowing an additional argument 'tz' which would be used to generate the timestamp in tz aware fashion.
@Omega359 would that be desirable? I can start working on to overload from_unixtime function with additional time_zone parameter as you've mentioned.
Sounds good. I would like to still think about what it would take to get udf's access to the config though.
This may now be feasible after
- https://github.com/apache/datafusion/issues/13519
UDF's currently do not have access to the DF context which would make implementing this using configuration a bit difficult.
- https://github.com/apache/datafusion/pull/16573
Now that #16970 was merged this should now be feasible.