datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Make from_unixtime aware of execution timezone

Open niebayes opened this issue 1 year ago • 1 comments

Is your feature request related to a problem or challenge?

No response

Describe the solution you'd like

Applies an offset to the unixtime after the cast.

Describe alternatives you've considered

No response

Additional context

Datafusion's from_unixtime is not aware of timezone. The following code block demonstrates the interaction between me and datafusion-cli.

> set datafusion.execution.time_zone = '+08:00';
0 row(s) fetched. 
Elapsed 0.000 seconds.

> select to_unixtime('2024-09-01 10:00:00+08:00');
+------------------------------------------------+
| to_unixtime(Utf8("2024-09-01 10:00:00+08:00")) |
+------------------------------------------------+
| 1725156000                                     |
+------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.001 seconds.

> select from_unixtime(1725156000);
+----------------------------------+
| from_unixtime(Int64(1725156000)) |
+----------------------------------+
| 2024-09-01T02:00:00              |
+----------------------------------+
1 row(s) fetched. 
Elapsed 0.001 seconds.

Specifically, when converting a date to unixtime with the to_unixtime function, we can provide a timezone and the cast result could reveal the timezone. However, when converting the unixtime back to a date, Datafusion is not aware of timezone. Yes, the cast result is correct, but the timezone info is discarded and only a date with timezone UTC+0 is returned.

I wonder if we can make the from_unixtime function aware of timezone. For example, by setting the execution timezone through the datafusion.execution.time_zone configuration, we can apply an offset to the cast result so the timezone info could be revealed.

niebayes avatar Oct 12 '24 09:10 niebayes

UDF's currently do not have access to the DF context which would make implementing this using configuration a bit difficult. What could be easily supported is allowing an additional argument 'tz' which would be used to generate the timestamp in tz aware fashion.

Omega359 avatar Oct 12 '24 18:10 Omega359

@Omega359 would that be desirable? I can start working on to overload from_unixtime function with additional time_zone parameter as you've mentioned.

buraksenn avatar Oct 21 '24 19:10 buraksenn

Sounds good. I would like to still think about what it would take to get udf's access to the config though.

Omega359 avatar Oct 21 '24 21:10 Omega359

This may now be feasible after

  • https://github.com/apache/datafusion/issues/13519

alamb avatar Nov 21 '24 16:11 alamb

UDF's currently do not have access to the DF context which would make implementing this using configuration a bit difficult.

  • https://github.com/apache/datafusion/pull/16573

findepi avatar Jun 26 '25 15:06 findepi

Now that #16970 was merged this should now be feasible.

Omega359 avatar Aug 05 '25 17:08 Omega359