Draft: Optimize to_timestamp
Which issue does this PR close?
Closes https://github.com/apache/arrow-datafusion/issues/9090.
What changes are included in this PR?
This is still work in progress. The PR introduces a few simple optimizations to to_timestamp (without format). On my local machine, the performance improvement in the benchmarks is about 15%.
Are these changes tested?
Yes, existing unit tests and benchmarks.
Are there any user-facing changes?
No.
I wonder if this could make use of the upstream parsers that are already highly optimised
https://docs.rs/arrow-cast/latest/arrow_cast/parse/index.html
@tustvold the existing implementation was already using those (via string_to_timestamp_nanos). One thing I noticed early on while profiling is that the upstream parser can be optimized even further. This is why I copied a bunch of parsing code from upstream to this PR to see if there are any quick wins there (there are). Obviously, this should go into upstream eventually.
Heh, I thought the code looked familiar. Perhaps we could work on this upstream to avoid a load of duplicated code and make it more obvious what has changed?
@tustvold Shall I open an issue in arrow-rs then?
Or just file a PR tbh, minor changes don't need issues
@tustvold I have created https://github.com/apache/arrow-rs/pull/5542
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.