cudf
cudf copied to clipboard
Implement groupby apply with JIT
Description
Experimental cuDF Groupby Apply JIT pipeline.
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [] The documentation is up to date with these changes.
@bwyogatama We should move the function.cu
file to a place like cudf/cpp/src/udf/kernels.cu
(just the first thing came into my mind, feel free to suggest the proper place) to make this PR exposed to cpp reviewers.
This CUDA source is required to be compiled to a ptx file in order to be used by the python code. Not sure what would be the best way to do this in cudf. @robertmaynard Any pointers are highly appreciated.
If we want to compile to PTX I think the cleanest CMake solution is to create an object library with just this file and then set CUDA_PTX_COMPILATION.
Also we should probably discuss reusing atomics etc from libcudf, although I'm not yet sure how we want that to look since I don't think we want to make those part of the libcudf public API.
If we want to compile to PTX I think the cleanest CMake solution is to create an object library with just this file and then set CUDA_PTX_COMPILATION.
Also we should probably discuss reusing atomics etc from libcudf, although I'm not yet sure how we want that to look since I don't think we want to make those part of the libcudf public API.
This is correct, but you will need to iterate over the values of CMAKE_CUDA_ARCHITECTURES
and create a new object library for each value since PTX compilation only supports a single arch.
set(ptx_src "src/a.cu")
foreach(arch IN LISTS CMAKE_CUDA_ARCHITECTURES)
add_library(ptx_example_${arch} OBJECT ${ptx_src})
set_target_properties(ptx_example_${arch}
PROPERTIES CUDA_ARCHITECTURES ${arch}
CUDA_PTX_COMPILATION ON
)
endforeach()
We need to update the install rules to also ship the ptx output
We should move the
function.cu
file to a place likecudf/cpp/src/udf/kernels.cu
(just the first thing came into my mind, feel free to suggest the proper place) to make this PR exposed to cpp reviewers.
We went back and forth on this. We considered leaving it somewhere in the python area since it serves only the python API. I think it's a good idea that it is owned by the c++ code owners though.
We need to update the install rules to also ship the ptx output
We're going to need these PTX files to come as part of the conda packages as well. Both of these issues need to be solved in https://github.com/rapidsai/cudf/pull/11319 as well which relies on the same pattern, so if we can solve them here that's great! :)
This is super cool to see! Can you also post one of the early benchmarks plot comparing performance?
This PR has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d
if there is no activity in the next 60 days.
@bwyogatama @brandon-b-miller I've updated the build system here to get the JIT code path to compile as part of the library. I've also merged in the latest branch-22.10 and made some minor fixes to the testing code so that it'll give us useful diagnostics for now. There are small differences between cudf and pandas groupby that cause some issues, so I'm just working around them for now so that we can focus on having testing that verifies that the JIT code path works about as expected.
Codecov Report
Base: 86.58% // Head: 85.83% // Decreases project coverage by -0.75%
:warning:
Coverage data is based on head (
eaa8ff7
) compared to base (b6dccb3
). Patch has no changes to coverable lines.
Additional details and impacted files
@@ Coverage Diff @@
## branch-23.02 #11452 +/- ##
================================================
- Coverage 86.58% 85.83% -0.75%
================================================
Files 155 158 +3
Lines 24368 25164 +796
================================================
+ Hits 21098 21599 +501
- Misses 3270 3565 +295
Impacted Files | Coverage Δ | |
---|---|---|
python/cudf/cudf/_version.py | 1.41% <0.00%> (-98.59%) |
:arrow_down: |
python/cudf/cudf/core/buffer/spill_manager.py | 72.50% <0.00%> (-7.50%) |
:arrow_down: |
python/cudf/cudf/core/udf/utils.py | 93.75% <0.00%> (-5.35%) |
:arrow_down: |
python/strings_udf/strings_udf/__init__.py | 66.66% <0.00%> (-2.69%) |
:arrow_down: |
python/cudf/cudf/utils/dtypes.py | 77.85% <0.00%> (-1.61%) |
:arrow_down: |
python/cudf/cudf/options.py | 86.11% <0.00%> (-1.59%) |
:arrow_down: |
python/cudf/cudf/core/single_column_frame.py | 94.30% <0.00%> (-1.27%) |
:arrow_down: |
python/cudf/cudf/io/json.py | 91.04% <0.00%> (-1.02%) |
:arrow_down: |
...ython/custreamz/custreamz/tests/test_dataframes.py | 98.38% <0.00%> (-1.01%) |
:arrow_down: |
python/dask_cudf/dask_cudf/io/csv.py | 96.34% <0.00%> (-1.00%) |
:arrow_down: |
... and 57 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@brandon-b-miller the tests now pass. The outstanding build failures are happening because some of the build steps happen on systems without GPUs. I ported over the logic I wrote for strings_udf
to select a PTX file for the appropriate architecture, and that code includes a numba call to get the current device. In strings_udf
that's inside a try-except
, so I'm assuming that on these CI executors we just set strings_udf.ENABLED = False
and otherwise continue happily. We need to be able to import cudf
on a system without the CUDA driver available, so we need to do something similar in this PR where the groupby_function
module executes the driver version check conditionally, only when a CUDA driver is enabled. I'll let you and @bwyogatama decide exactly what that should look like in terms of the structure of the code as you reorganize.
Thanks @vyasr for contributing to this feature. I will coordinate with @brandon-b-miller and you starting next week to get in the loop and to start the draft moving again.
Firstly, thanks to @bwyogatama for this incredible PR, and thanks to @vyasr again for working through the PTX building process. I have a couple of general thoughts for moving this forward:
- If we can make c++ changes that solve the issue of programmatically generating functions for all our types, I think we should pursue that - but I don't think it's worth working too hard to optimizing the c++ for performance at this stage given how fast this is already relative to vanilla
Groupby.apply
. To that end I suggest focusing first on making the python robust. I believe this will lead to a useable feature the fastest. - In
strings_udf
, we punted on the problem of not putting thiscu
file intolibcudf
by putting it completely into it's own package and building it separately. We clearly can't do this for every PTX file we choose to use in this way. Since libcudf likely wont accept this code, the only place I can think of putting it is acudf/cpp
directory. This maybe could set a good precedent for projects following this pattern going forward. - Lastly, I'd like to reorganize some of this code a bit. Typing and cuda declarations should go in
groupby_typing.py
. Lowering ingroupby_lowering.py
. It makes it a little easier to know where things are, and where to import them from.
Excited to iterate on this!
@brandon-b-miller @vyasr I refactored the C++ files following @bdice suggestion. I will push more changes on the python side this weekend.
@bwyogatama Can you please update the base branch to branch-22.12
and resolve conflicts to unblock CI tests?
Can one of the admins verify this patch?
Admins can comment ok to test
to allow this one PR to run or add to allowlist
to allow all future PRs from the same author to run.
ok to test
@bwyogatama Can you please update the base branch to
branch-22.12
and resolve conflicts to unblock CI tests?
I think I have updated the base branch to branch-22.12
Hi @bwyogatama !
I created a PR into your branch that should clean things up and put us in a position to start moving forward:
https://github.com/bwyogatama/cudf/pull/1
See the comments on that PR for the exact set of changes - the diff looks big for now but if you merge that PR and retarget this PR to 23.02, you should see the diff collapse again. If this works I can open a few follow up PR's to help out with the python here.
Hi @bwyogatama !
I created a PR into your branch that should clean things up and put us in a position to start moving forward:
See the comments on that PR for the exact set of changes - the diff looks big for now but if you merge that PR and retarget this PR to 23.02, you should see the diff collapse again. If this works I can open a few follow up PR's to help out with the python here.
Thanks @brandon-b-miller for the help! I will check on this and get back to you!
I figured out how to push changes directly here so no need for the other PR! :)
FWIW, it seems like one of the merges (or possibly retargeting to a new branch) has marked as outdated many comments which I think are not (so in the diff view we don't see them).
rerun tests
/ok to test
/ok to test
/ok to test
/ok to test
@bwyogatama I've isolated the idxmax
failure to be only related to the float64
dtype:
FAILED python/cudf/cudf/tests/test_groupby.py::test_groupby_apply_jit_reductions[idxmax-dtype1]
/ok to test
thanks for your thorough review @wence- I think I made a pass on everything you mentioned so far in the python layer.
/ok to test
/ok to test