datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Draft: Optimize to_timestamp

Open vojtechtoman opened this issue 1 year ago • 7 comments

Which issue does this PR close?

Closes https://github.com/apache/arrow-datafusion/issues/9090.

What changes are included in this PR?

This is still work in progress. The PR introduces a few simple optimizations to to_timestamp (without format). On my local machine, the performance improvement in the benchmarks is about 15%.

Are these changes tested?

Yes, existing unit tests and benchmarks.

Are there any user-facing changes?

No.

vojtechtoman avatar Mar 19 '24 13:03 vojtechtoman

I wonder if this could make use of the upstream parsers that are already highly optimised

https://docs.rs/arrow-cast/latest/arrow_cast/parse/index.html

tustvold avatar Mar 20 '24 19:03 tustvold

@tustvold the existing implementation was already using those (via string_to_timestamp_nanos). One thing I noticed early on while profiling is that the upstream parser can be optimized even further. This is why I copied a bunch of parsing code from upstream to this PR to see if there are any quick wins there (there are). Obviously, this should go into upstream eventually.

vojtechtoman avatar Mar 21 '24 06:03 vojtechtoman

Heh, I thought the code looked familiar. Perhaps we could work on this upstream to avoid a load of duplicated code and make it more obvious what has changed?

tustvold avatar Mar 21 '24 06:03 tustvold

@tustvold Shall I open an issue in arrow-rs then?

vojtechtoman avatar Mar 21 '24 07:03 vojtechtoman

Or just file a PR tbh, minor changes don't need issues

tustvold avatar Mar 21 '24 07:03 tustvold

@tustvold I have created https://github.com/apache/arrow-rs/pull/5542

vojtechtoman avatar Mar 22 '24 10:03 vojtechtoman

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 22 '24 01:05 github-actions[bot]