arrow-julia
arrow-julia copied to clipboard
DST ambiguities in ZonedDateTime not supported
It seems like the ArrowTypes representation of ZonedDateTime doesn't include enough information to resolve ambiguities around DST, e.g.:
julia> zdt = ZonedDateTime(DateTime(2020, 11, 1, 6), tz"America/New_York"; from_utc=true)
2020-11-01T01:00:00-05:00
julia> arrow_zdt = ArrowTypes.toarrow(zdt)
Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")}(1604192400000)
julia> ArrowTypes.fromarrow(ZonedDateTime, arrow_zdt)
ERROR: AmbiguousTimeError: Local DateTime 2020-11-01T01:00:00 is ambiguous within America/New_York
Stacktrace:
[1] (::TimeZones.var"#construct#8"{DateTime, VariableTimeZone})(T::Type{Local})
@ TimeZones ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:46
[2] #ZonedDateTime#7
@ ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:50 [inlined]
[3] ZonedDateTime
@ ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:37 [inlined]
[4] convert(#unused#::Type{ZonedDateTime}, x::Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")})
@ Arrow ~/.julia/packages/Arrow/ZlMFU/src/eltypes.jl:265
[5] fromarrow(#unused#::Type{ZonedDateTime}, x::Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")})
@ Arrow ~/.julia/packages/Arrow/ZlMFU/src/eltypes.jl:300
[6] top-level scope
@ REPL[16]:1
Knowing very little about how how we're constrained within the arrow spec, can this be fixed by storing the UTC timestamp? I'm guessing we're running into this as we're storing a local timestamp + the timezone (?), which isn't quite enough information.
From the top of my head spec defines timestamp as UTC timestamp + timezone or timestamp + unknown timezone. Having local timestamp + timezone is not enough to disambiguate, but UTC timestamp + timezone is.
The relevant sections of code are:
https://github.com/apache/arrow-julia/blob/532b89b2c5740124cadca632a14ebb6cc9a0dca5/src/eltypes.jl#L273-L274
and
https://github.com/apache/arrow-julia/blob/532b89b2c5740124cadca632a14ebb6cc9a0dca5/src/eltypes.jl#L263-L266
which indeed show that we're currently using an Arrow.Timestamp
object with local time since the epoch and the timezone.
This is the closest that I could find as to the official specification, which indeed says that the timestamp part should be in UTC:
https://github.com/apache/arrow/blob/20b66c255ff617c438775e54081eaa02d5b983e1/js/src/fb/timestamp.ts#L16-L21
Unless I'm missing something, I think this means that we currently have broken interop with other languages for zoned date times?
Unless I'm missing something, I think this means that we currently have broken interop with other languages for zoned date times?
It does seem so.