GH-15058: [C++][Python] Native support for UUID
Rationale for this change
See #15058. UUID datatype is common in throughout the ecosystem and Arrow as supporting it as a native type would reduce friction.
What changes are included in this PR?
This PR implements logic for Arrow canonical extension type in C++ and a Python wrapper.
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes, new extension type is added.
- Closes: #15058
:warning: GitHub issue #15058 has been automatically assigned in GitHub to PR creator.
mentioned https://github.com/apache/arrow/issues/15058#issuecomment-1687558148 , do we need to remove that?
Key error: A type extension with name uuid already defined
Looks Ruby already got support for UUID?
Hi @rok , can you please summarize where you're stuck at with UUIDs? I see some installation failures
@arogozhnikov I think it was mostly about lack of reviews :) I've rebased, let's see what current problems are.
Integration failures seem unrelated. It would be good to check again after https://github.com/apache/arrow/pull/41264 merged to be sure.
I've opened a seperate PR for the format change. If there's no objections I'd like to call for a ML vote tomorrow.
@jorisvandenbossche any idea who'd have time to review this?
@rok I'm taking a look. Thanks for being persistent :-)
Thanks for the review @pitrou! I've addressed all comments and waiting to see if CI shakes out any problems.
@rok Github claims there are conflicts below ("This branch has conflicts that must be resolved"). Can you take a look?
Also, there are test failures in Python now :-)
Rebased. I'm not sure what the value_type method missing error when pickling is about. Checking.
I think python should be ok now.
Ping @jorisvandenbossche
Thanks everyone!
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 2328b6ee39b497d9f48e6d342db9f7d0c34d9791.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details. It also includes information about 24 possible false positives for unstable benchmarks that are known to sometimes produce them.
Thanks a lot @rok!
As another possible follow-up, would we want to support inferring and converting uuid.UUID objects in the python->arrow conversion layer? (although not sure if that's something people would be waiting for)
@jorisvandenbossche that sounds convenient! I've opened https://github.com/apache/arrow/issues/43855 and will get around to it if someone else doesn't sooner :).