arrow-julia icon indicating copy to clipboard operation
arrow-julia copied to clipboard

`warntimestamp` in convert too expensive

Open visr opened this issue 4 months ago • 3 comments

#172 added a warning. I think it is too expensive, and it may be better to just document the behavior instead.

I am reading a 7760500 row table written by pandas, which defaults to nanosecond resolution, and want to convert it to DateTime, I don't need sub-ms precision. The conversion worked out of the box, but took 75 seconds. Profiling showed that almost all time was in warntimestamp generating the log message. Without the log message it takes 0.05 seconds.

This shows it in a benchmark:

using Chairmarks, Arrow, Dates

# alternative to convert that doesn't have warntimestamp
function to_datetime(x::Arrow.Timestamp{U, nothing})::DateTime where {U}
    x_since_epoch = Arrow.periodtype(U)(x.x)
    ms_since_epoch = Dates.toms(x_since_epoch)
    ut_instant = Dates.UTM(ms_since_epoch + Arrow.UNIX_EPOCH_DATETIME)
    return DateTime(ut_instant)
end

const ts = Arrow.Timestamp{Arrow.Flatbuf.TimeUnit.NANOSECOND, nothing}(1764288000000000000)
@b convert(DateTime, ts)  # 6.525 μs (119 allocs: 6.719 KiB)
@b to_datetime(ts)  # 1.332 ns

I now avoid this with convert = false and using the to_datetime function above, but I think more people will run into this performance pitfall.

Image

visr avatar Aug 29 '25 08:08 visr

that seems to also worth an issue in Julia itself, aboutmaxlog=1 being not efficient

Moelf avatar Aug 29 '25 14:08 Moelf

It seems that on unreleased julia versions the logging is a lot faster already, but still ~40 slower than without the log (to_datetime performance seems stable):

@b convert(DateTime, ts)  # 6.525 μs (119 allocs: 6.719 KiB)  julia 1.11.6
@b convert(DateTime, ts)  # 55.349 ns (1 allocs: 16 bytes)    julia 1.12.0-rc1.45
@b convert(DateTime, ts)  # 53.556 ns (1 allocs: 16 bytes)    julia 1.13.0-DEV.1055

It seems most of the damage comes from fixup_stdlib_path (at least on Windows), which comes before the maxlog handling, which seems to be up to the logger type to implement.

Generating the message is costly, and maxlog cannot prevent this.

Though I should add that from a UX perspective I'd also rather not have a warning here, either error or nothing, and given that error would be quite breaking, just documenting the behavior has my preference.

visr avatar Aug 29 '25 15:08 visr

just documenting the behavior has my preference.

makes sense to me honestly.

Moelf avatar Aug 29 '25 15:08 Moelf