cudf icon indicating copy to clipboard operation
cudf copied to clipboard

adding wheel build for libcudf

Open msarahan opened this issue 10 months ago • 15 comments

Description

This is meant to add a C++ shared library wheel as an alternative to the statically linked self-contained wheel. It is part of https://github.com/rapidsai/build-planning/issues/33

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

msarahan avatar Apr 08 '24 15:04 msarahan

Stuck on dlpack here. The failure is:

/__w/cudf/cudf/python/cudf/build/cp310-cp310-manylinux_2_28_aarch64/cudf/_lib/interop.cxx:1443:10: fatal error: dlpack/dlpack.h: No such file or directory
   1443 | #include "dlpack/dlpack.h"
        |          ^~~~~~~~~~~~~~~~~
  compilation terminated.

https://github.com/rapidsai/cudf/actions/runs/8662725105/job/23755706695?pr=15483#step:9:290

msarahan avatar Apr 16 '24 16:04 msarahan

@vyasr the build is passing now, but the tests seem to show that the library (libcudf.so) can't be found. It does seem like libcudf-cuXY is being installed OK, so I'm thinking it has to be some kind of RPATH issue. I don't see how you've addressed this in the RMM PR. Any advice?

I've inspected one of the libcudf wheels, and it has both lib and lib64 folders. The lib folder has nvcomp stuff, and lib64 has libcudf.so.

msarahan avatar Apr 19 '24 14:04 msarahan

You won't see this in the rmm PR because rmm is header-only, so there is no library there. The raft PR is a more useful example here. What you want to look at is actually the Python components of the libraft library, specifically the load.py module. The idea is that we never set RPATHs on libraries explicitly. Instead, we use ctypes to dlopen the library before we import any of the extension modules. This is a more robust solution than setting RPATHS because of the various ways in which it is possible for Python packages to coexist on a user's system (different environments, PYTHONPATH modification, etc).

vyasr avatar Apr 19 '24 16:04 vyasr

You have to set the variables before the rapids-cmake clone happens, which in this case is in the include(../../rapids_config.cmake) line. IOW the variables as currently set won't have any effect. Also, I think you'll need to specify the repo as vyasr/rapids-cmake, not just vyasr.

vyasr avatar Apr 29 '24 19:04 vyasr

This is annoying: we've fixed directory, but there are now failures in the Python build trying to find nvcomp. Explanation is here.

vyasr avatar Apr 30 '24 01:04 vyasr

It looks like my last nvcomp patch worked! The Python wheels built as well.

vyasr avatar Apr 30 '24 22:04 vyasr

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Jun 19 '24 01:06 copy-pr-bot[bot]

/ok to test

msarahan avatar Jun 19 '24 01:06 msarahan

There's something wrong with RBB and pyarrow here. Here's the traceback: https://github.com/rapidsai/cudf/actions/runs/9653861106/job/26627010517#step:9:214

pyproject.toml has:

tool.rapids-build-backend]
requires = [
    "cmake>=3.26.4",
    "cython>=3.0.3",
    "ninja",
    "numpy==1.23.*",
    "pyarrow==16.1.0.*",
    "rmm==24.8.*,>=0.0.0a0",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.

which I think should be the right thing here. Maybe that's not the pyproject.toml that exists at the time of build, though. Is there any way to have RBB act in "debug mode" where it outputs the pyproject.toml that it has generated? @KyleFromNVIDIA

msarahan avatar Jun 24 '24 23:06 msarahan

Is there any way to have RBB act in "debug mode" where it outputs the pyproject.toml that it has generated?

None currently. What are you thinking this would look like? Writing the file to stdout? Copying it to an additional location? Leaving it as it wrote it after build is complete and not restoring it to its original state?

KyleFromNVIDIA avatar Jun 25 '24 13:06 KyleFromNVIDIA

Writing the file to stdout was my initial idea, but I'm open to whatever you think is best and/or easiest to do.

msarahan avatar Jun 25 '24 15:06 msarahan

Lots of problems with codecov stuff, which seem to be happening on other repos. There was one possibly concerning test failure:

FAILED python/cudf/cudf_pandas_tests/test_cudf_pandas.py::test_rmm_option_on_import[pool] - AssertionError: assert -8 == 0
 +  where -8 = CompletedProcess(args=['python', '-m', 'cudf.pandas', '/__w/cudf/cudf/python/cudf/cudf_pandas_tests/data/profile_basic.py'], returncode=-8, stdout='', stderr='').returncode
FAILED python/cudf/cudf_pandas_tests/test_cudf_pandas.py::test_rmm_option_on_import[managed_pool] - AssertionError: assert -8 == 0
 +  where -8 = CompletedProcess(args=['python', '-m', 'cudf.pandas', '/__w/cudf/cudf/python/cudf/cudf_pandas_tests/data/profile_basic.py'], returncode=-8, stdout='', stderr='').returncode

msarahan avatar Jun 26 '24 15:06 msarahan

Could you update the PR description now?

vyasr avatar Jun 27 '24 19:06 vyasr

Please open a PR at https://github.com/rapidsai/rapids-metadata to reflect the new project structure. This can be the same PR as the one that updates cugraph.

KyleFromNVIDIA avatar Jun 27 '24 21:06 KyleFromNVIDIA

@KyleFromNVIDIA https://github.com/rapidsai/rapids-metadata/pull/18

@vyasr the description is up-to-date. Thanks for the reminder.

msarahan avatar Jun 27 '24 23:06 msarahan

I'm picking this up, moving it back to "draft" for now until I get the tests passing, then will put it back up for review.

jameslamb avatar Jul 05 '24 15:07 jameslamb

I've retargeted this at branch-24.10 and updated all the version references.

jameslamb avatar Jul 29 '24 17:07 jameslamb

JFYI I'm expecting #16299 to get done before this PR, which means there will probably need to be some associated updates made here. On the off chance that the situation ends up reversed, we can work on updating that PR accordingly.

vyasr avatar Aug 14 '24 20:08 vyasr

@jameslamb OK #16299 is merged now so you've got a bunch of conflicts here. Let me know if you have any issues in resolving them!

vyasr avatar Aug 16 '24 19:08 vyasr

Thanks @vyasr .

If you have time to be a quick ci-codeowners reviewer for me, I'd appreciate a review on #16575. If that's merged, it'll also have some conflicts with this PR. No worries if you don't have time, not a big set of changes either way.

jameslamb avatar Aug 16 '24 19:08 jameslamb

Sure thing, looking now.

vyasr avatar Aug 16 '24 19:08 vyasr

Alright, I think this is ready for review!

I've updated the PR description with some relevant details.

I've gone through the prior review threads and resolved any that are now resolved. Anything still open is something where I'm specifically looking for reviewer feedback (except https://github.com/rapidsai/cudf/pull/15483#discussion_r1723657818, which is some pre-commit hook behavior I need to investigate).

I understand devcontainers changes will be necessary as well... I'll get those staged in a PR into https://github.com/rapidsai/devcontainers.

jameslamb avatar Aug 21 '24 18:08 jameslamb

I've gone through and responded to the many threads where there was open discussion that needed some feedback from me, but I haven't reviewed yet. Will do that soon.

vyasr avatar Aug 22 '24 00:08 vyasr

devcontainers job passed after merging https://github.com/rapidsai/devcontainers/pull/271,

https://github.com/rapidsai/cudf/actions/runs/10515765176/job/29173481700?pr=15483

😁 thanks @AyodeAwe

Given that + all the approvals here, I'm gonna merge this. Thanks so much for all the help everyone!!!

jameslamb avatar Aug 23 '24 15:08 jameslamb

/merge

jameslamb avatar Aug 23 '24 15:08 jameslamb