cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Upgrade to Arrow 17

Open galipremsagar opened this issue 1 year ago • 7 comments

Description

This PR upgrades to arrow-17 in cudf.

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

galipremsagar avatar Jul 25 '24 20:07 galipremsagar

Thanks @vyasr - I had the same concern. Let's try to reduce our Arrow dependencies first, and we can upgrade this later in the 24.10 release cycle if still needed.

bdice avatar Jul 26 '24 04:07 bdice

I don't think we want to upgrade this, certainly not so quickly. We know that moving our Arrow dependency too quickly can cause pain for some of our users (CC @beckernick), and also we're targeting removing the libarrow dependency in 24.10 anyway.

Sounds good 👍

galipremsagar avatar Jul 26 '24 05:07 galipremsagar

@galipremsagar once #16640 merges, we should replace this PR with one that simply relaxes the pyarrow constraint. Would you be able to take point on testing the versions that we support? I expect that anything after pyarrow 13 should work for us (we need pyarrow 13 for https://github.com/apache/arrow/pull/36162).

@seberg when we add the earliest dependency testing to cudf (the cudf version of https://github.com/rapidsai/rmm/pull/1613) can we also ensure that the appropriate pyarrow (and pandas) versions are being tested there? Thank you!

vyasr avatar Aug 22 '24 17:08 vyasr

@galipremsagar once #16640 merges, we should replace this PR with one that simply relaxes the pyarrow constraint. Would you be able to take point on testing the versions that we support? I expect that anything after pyarrow 13 should work for us (we need pyarrow 13 for apache/arrow#36162).

@seberg when we add the earliest dependency testing to cudf (the cudf version of rapidsai/rmm#1613) can we also ensure that the appropriate pyarrow (and pandas) versions are being tested there? Thank you!

Sure 👍

galipremsagar avatar Aug 22 '24 17:08 galipremsagar

hen we add the earliest dependency testing to cudf (the cudf version of https://github.com/rapidsai/rmm/pull/1613) can we also ensure that the appropriate pyarrow (and pandas) versions are being tested there?

They already are part of gh-16570 (although you can double check the versions, and I guess this would modify them).

seberg avatar Aug 22 '24 17:08 seberg

Oh awesome I thought there was a cudf PR for this but couldn't find it. Thanks for pointing it out! Yes, we'd just want to update the versions there.

vyasr avatar Aug 22 '24 17:08 vyasr

Right this is the cuDF PR: https://github.com/rapidsai/cudf/pull/16570

Think Matthew is going to work on adding some skips and xfails as appropriate: https://github.com/rapidsai/cudf/pull/16570#discussion_r1725956815

Guessing once that is done it can be merged

jakirkham avatar Aug 22 '24 21:08 jakirkham

Replaced by #16681.

vyasr avatar Aug 28 '24 16:08 vyasr