arrow-julia icon indicating copy to clipboard operation
arrow-julia copied to clipboard

DST ambiguities in ZonedDateTime not supported

Open tpgillam opened this issue 2 years ago • 3 comments

It seems like the ArrowTypes representation of ZonedDateTime doesn't include enough information to resolve ambiguities around DST, e.g.:

julia> zdt = ZonedDateTime(DateTime(2020, 11, 1, 6), tz"America/New_York"; from_utc=true)
2020-11-01T01:00:00-05:00

julia> arrow_zdt = ArrowTypes.toarrow(zdt)
Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")}(1604192400000)

julia> ArrowTypes.fromarrow(ZonedDateTime, arrow_zdt)
ERROR: AmbiguousTimeError: Local DateTime 2020-11-01T01:00:00 is ambiguous within America/New_York
Stacktrace:
 [1] (::TimeZones.var"#construct#8"{DateTime, VariableTimeZone})(T::Type{Local})
   @ TimeZones ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:46
 [2] #ZonedDateTime#7
   @ ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:50 [inlined]
 [3] ZonedDateTime
   @ ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:37 [inlined]
 [4] convert(#unused#::Type{ZonedDateTime}, x::Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")})
   @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/eltypes.jl:265
 [5] fromarrow(#unused#::Type{ZonedDateTime}, x::Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")})
   @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/eltypes.jl:300
 [6] top-level scope
   @ REPL[16]:1

Knowing very little about how how we're constrained within the arrow spec, can this be fixed by storing the UTC timestamp? I'm guessing we're running into this as we're storing a local timestamp + the timezone (?), which isn't quite enough information.

tpgillam avatar Jun 28 '22 12:06 tpgillam

From the top of my head spec defines timestamp as UTC timestamp + timezone or timestamp + unknown timezone. Having local timestamp + timezone is not enough to disambiguate, but UTC timestamp + timezone is.

rok avatar Jun 28 '22 13:06 rok

The relevant sections of code are:

https://github.com/apache/arrow-julia/blob/532b89b2c5740124cadca632a14ebb6cc9a0dca5/src/eltypes.jl#L273-L274

and

https://github.com/apache/arrow-julia/blob/532b89b2c5740124cadca632a14ebb6cc9a0dca5/src/eltypes.jl#L263-L266

which indeed show that we're currently using an Arrow.Timestamp object with local time since the epoch and the timezone.

This is the closest that I could find as to the official specification, which indeed says that the timestamp part should be in UTC:

https://github.com/apache/arrow/blob/20b66c255ff617c438775e54081eaa02d5b983e1/js/src/fb/timestamp.ts#L16-L21

Unless I'm missing something, I think this means that we currently have broken interop with other languages for zoned date times?

tpgillam avatar Jun 28 '22 18:06 tpgillam

Unless I'm missing something, I think this means that we currently have broken interop with other languages for zoned date times?

It does seem so.

rok avatar Jun 28 '22 18:06 rok