arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-15058: [C++][Python] Native support for UUID

Open rok opened this issue 2 years ago • 7 comments

Rationale for this change

See #15058. UUID datatype is common in throughout the ecosystem and Arrow as supporting it as a native type would reduce friction.

What changes are included in this PR?

This PR implements logic for Arrow canonical extension type in C++ and a Python wrapper.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, new extension type is added.

  • Closes: #15058

rok avatar Aug 22 '23 08:08 rok

:warning: GitHub issue #15058 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Aug 22 '23 08:08 github-actions[bot]

mentioned https://github.com/apache/arrow/issues/15058#issuecomment-1687558148 , do we need to remove that?

mapleFU avatar Aug 22 '23 08:08 mapleFU

Key error: A type extension with name uuid already defined

Looks Ruby already got support for UUID?

arogozhnikov avatar Sep 06 '23 23:09 arogozhnikov

Hi @rok , can you please summarize where you're stuck at with UUIDs? I see some installation failures

arogozhnikov avatar Dec 04 '23 20:12 arogozhnikov

@arogozhnikov I think it was mostly about lack of reviews :) I've rebased, let's see what current problems are.

rok avatar Dec 04 '23 21:12 rok

Integration failures seem unrelated. It would be good to check again after https://github.com/apache/arrow/pull/41264 merged to be sure.

rok avatar Apr 18 '24 18:04 rok

I've opened a seperate PR for the format change. If there's no objections I'd like to call for a ML vote tomorrow.

rok avatar Apr 18 '24 19:04 rok

@jorisvandenbossche any idea who'd have time to review this?

rok avatar Aug 22 '24 08:08 rok

@rok I'm taking a look. Thanks for being persistent :-)

pitrou avatar Aug 22 '24 09:08 pitrou

Thanks for the review @pitrou! I've addressed all comments and waiting to see if CI shakes out any problems.

rok avatar Aug 22 '24 11:08 rok

@rok Github claims there are conflicts below ("This branch has conflicts that must be resolved"). Can you take a look?

pitrou avatar Aug 22 '24 13:08 pitrou

Also, there are test failures in Python now :-)

pitrou avatar Aug 22 '24 13:08 pitrou

Rebased. I'm not sure what the value_type method missing error when pickling is about. Checking.

rok avatar Aug 22 '24 13:08 rok

I think python should be ok now.

rok avatar Aug 22 '24 13:08 rok

Ping @jorisvandenbossche

rok avatar Aug 26 '24 09:08 rok

Thanks everyone!

rok avatar Aug 26 '24 15:08 rok

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 2328b6ee39b497d9f48e6d342db9f7d0c34d9791.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 24 possible false positives for unstable benchmarks that are known to sometimes produce them.

Thanks a lot @rok!

As another possible follow-up, would we want to support inferring and converting uuid.UUID objects in the python->arrow conversion layer? (although not sure if that's something people would be waiting for)

jorisvandenbossche avatar Aug 27 '24 17:08 jorisvandenbossche

@jorisvandenbossche that sounds convenient! I've opened https://github.com/apache/arrow/issues/43855 and will get around to it if someone else doesn't sooner :).

rok avatar Aug 27 '24 19:08 rok