Description

Experimental cuDF Groupby Apply JIT pipeline.

Checklist

[x] I am familiar with the Contributing Guidelines.
[x] New or existing tests cover these changes.
[] The documentation is up to date with these changes.

Aug 03 '22 19:08 bwyogatama

@bwyogatama We should move the function.cu file to a place like cudf/cpp/src/udf/kernels.cu (just the first thing came into my mind, feel free to suggest the proper place) to make this PR exposed to cpp reviewers.

This CUDA source is required to be compiled to a ptx file in order to be used by the python code. Not sure what would be the best way to do this in cudf. @robertmaynard Any pointers are highly appreciated.

Aug 03 '22 20:08 PointKernel

If we want to compile to PTX I think the cleanest CMake solution is to create an object library with just this file and then set CUDA_PTX_COMPILATION.

Also we should probably discuss reusing atomics etc from libcudf, although I'm not yet sure how we want that to look since I don't think we want to make those part of the libcudf public API.

Aug 04 '22 16:08 vyasr

If we want to compile to PTX I think the cleanest CMake solution is to create an object library with just this file and then set CUDA_PTX_COMPILATION.

Also we should probably discuss reusing atomics etc from libcudf, although I'm not yet sure how we want that to look since I don't think we want to make those part of the libcudf public API.

This is correct, but you will need to iterate over the values of CMAKE_CUDA_ARCHITECTURES and create a new object library for each value since PTX compilation only supports a single arch.

set(ptx_src "src/a.cu")
foreach(arch IN LISTS CMAKE_CUDA_ARCHITECTURES)
    add_library(ptx_example_${arch} OBJECT ${ptx_src})
    set_target_properties(ptx_example_${arch}
        PROPERTIES CUDA_ARCHITECTURES ${arch}
                   CUDA_PTX_COMPILATION ON
       )
endforeach()

We need to update the install rules to also ship the ptx output

Aug 04 '22 18:08 robertmaynard

We should move the function.cu file to a place like cudf/cpp/src/udf/kernels.cu (just the first thing came into my mind, feel free to suggest the proper place) to make this PR exposed to cpp reviewers.

We went back and forth on this. We considered leaving it somewhere in the python area since it serves only the python API. I think it's a good idea that it is owned by the c++ code owners though.

We need to update the install rules to also ship the ptx output

We're going to need these PTX files to come as part of the conda packages as well. Both of these issues need to be solved in https://github.com/rapidsai/cudf/pull/11319 as well which relies on the same pattern, so if we can solve them here that's great! :)

Aug 05 '22 14:08 brandon-b-miller

This is super cool to see! Can you also post one of the early benchmarks plot comparing performance?

Aug 08 '22 14:08 quasiben

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

Sep 18 '22 19:09 github-actions[bot]

@bwyogatama @brandon-b-miller I've updated the build system here to get the JIT code path to compile as part of the library. I've also merged in the latest branch-22.10 and made some minor fixes to the testing code so that it'll give us useful diagnostics for now. There are small differences between cudf and pandas groupby that cause some issues, so I'm just working around them for now so that we can focus on having testing that verifies that the JIT code path works about as expected.

Sep 24 '22 00:09 vyasr

Codecov Report

Base: 86.58% // Head: 85.83% // Decreases project coverage by -0.75% :warning:

Coverage data is based on head (eaa8ff7) compared to base (b6dccb3). Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff                @@
##           branch-23.02   #11452      +/-   ##
================================================
- Coverage         86.58%   85.83%   -0.75%     
================================================
  Files               155      158       +3     
  Lines             24368    25164     +796     
================================================
+ Hits              21098    21599     +501     
- Misses             3270     3565     +295

Impacted Files	Coverage Δ
python/cudf/cudf/_version.py	`1.41% <0.00%> (-98.59%)`	:arrow_down:
python/cudf/cudf/core/buffer/spill_manager.py	`72.50% <0.00%> (-7.50%)`	:arrow_down:
python/cudf/cudf/core/udf/utils.py	`93.75% <0.00%> (-5.35%)`	:arrow_down:
python/strings_udf/strings_udf/__init__.py	`66.66% <0.00%> (-2.69%)`	:arrow_down:
python/cudf/cudf/utils/dtypes.py	`77.85% <0.00%> (-1.61%)`	:arrow_down:
python/cudf/cudf/options.py	`86.11% <0.00%> (-1.59%)`	:arrow_down:
python/cudf/cudf/core/single_column_frame.py	`94.30% <0.00%> (-1.27%)`	:arrow_down:
python/cudf/cudf/io/json.py	`91.04% <0.00%> (-1.02%)`	:arrow_down:
...ython/custreamz/custreamz/tests/test_dataframes.py	`98.38% <0.00%> (-1.01%)`	:arrow_down:
python/dask_cudf/dask_cudf/io/csv.py	`96.34% <0.00%> (-1.00%)`	:arrow_down:
... and 57 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Sep 24 '22 02:09 codecov[bot]

@brandon-b-miller the tests now pass. The outstanding build failures are happening because some of the build steps happen on systems without GPUs. I ported over the logic I wrote for strings_udf to select a PTX file for the appropriate architecture, and that code includes a numba call to get the current device. In strings_udf that's inside a try-except, so I'm assuming that on these CI executors we just set strings_udf.ENABLED = False and otherwise continue happily. We need to be able to import cudf on a system without the CUDA driver available, so we need to do something similar in this PR where the groupby_function module executes the driver version check conditionally, only when a CUDA driver is enabled. I'll let you and @bwyogatama decide exactly what that should look like in terms of the structure of the code as you reorganize.

Sep 24 '22 15:09 vyasr

Thanks @vyasr for contributing to this feature. I will coordinate with @brandon-b-miller and you starting next week to get in the loop and to start the draft moving again.

Sep 24 '22 20:09 bwyogatama

Firstly, thanks to @bwyogatama for this incredible PR, and thanks to @vyasr again for working through the PTX building process. I have a couple of general thoughts for moving this forward:

If we can make c++ changes that solve the issue of programmatically generating functions for all our types, I think we should pursue that - but I don't think it's worth working too hard to optimizing the c++ for performance at this stage given how fast this is already relative to vanilla Groupby.apply. To that end I suggest focusing first on making the python robust. I believe this will lead to a useable feature the fastest.
In strings_udf, we punted on the problem of not putting this cu file into libcudf by putting it completely into it's own package and building it separately. We clearly can't do this for every PTX file we choose to use in this way. Since libcudf likely wont accept this code, the only place I can think of putting it is a cudf/cpp directory. This maybe could set a good precedent for projects following this pattern going forward.
Lastly, I'd like to reorganize some of this code a bit. Typing and cuda declarations should go in groupby_typing.py. Lowering in groupby_lowering.py. It makes it a little easier to know where things are, and where to import them from.

Excited to iterate on this!

Oct 19 '22 20:10 brandon-b-miller

@brandon-b-miller @vyasr I refactored the C++ files following @bdice suggestion. I will push more changes on the python side this weekend.

Oct 28 '22 17:10 bwyogatama

@bwyogatama Can you please update the base branch to branch-22.12 and resolve conflicts to unblock CI tests?

Nov 02 '22 15:11 PointKernel

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

Nov 02 '22 15:11 GPUtester

ok to test

Nov 02 '22 15:11 PointKernel

@bwyogatama Can you please update the base branch to branch-22.12 and resolve conflicts to unblock CI tests?

I think I have updated the base branch to branch-22.12

Nov 02 '22 23:11 bwyogatama

Hi @bwyogatama !

I created a PR into your branch that should clean things up and put us in a position to start moving forward:

https://github.com/bwyogatama/cudf/pull/1

See the comments on that PR for the exact set of changes - the diff looks big for now but if you merge that PR and retarget this PR to 23.02, you should see the diff collapse again. If this works I can open a few follow up PR's to help out with the python here.

Nov 21 '22 18:11 brandon-b-miller

Hi @bwyogatama !

I created a PR into your branch that should clean things up and put us in a position to start moving forward:

bwyogatama#1

See the comments on that PR for the exact set of changes - the diff looks big for now but if you merge that PR and retarget this PR to 23.02, you should see the diff collapse again. If this works I can open a few follow up PR's to help out with the python here.

Thanks @brandon-b-miller for the help! I will check on this and get back to you!

Nov 22 '22 01:11 bwyogatama

I figured out how to push changes directly here so no need for the other PR! :)

Dec 13 '22 16:12 brandon-b-miller

FWIW, it seems like one of the merges (or possibly retargeting to a new branch) has marked as outdated many comments which I think are not (so in the diff view we don't see them).

Dec 13 '22 16:12 wence-

rerun tests

Jan 06 '23 17:01 brandon-b-miller

/ok to test

Jan 06 '23 17:01 brandon-b-miller

/ok to test

Jan 06 '23 17:01 brandon-b-miller

/ok to test

Jan 08 '23 16:01 brandon-b-miller

/ok to test

Jan 08 '23 18:01 brandon-b-miller

@bwyogatama I've isolated the idxmax failure to be only related to the float64 dtype:

FAILED python/cudf/cudf/tests/test_groupby.py::test_groupby_apply_jit_reductions[idxmax-dtype1]

Jan 09 '23 19:01 brandon-b-miller

/ok to test

Jan 09 '23 19:01 brandon-b-miller

thanks for your thorough review @wence- I think I made a pass on everything you mentioned so far in the python layer.

Jan 17 '23 19:01 brandon-b-miller

/ok to test

Jan 17 '23 19:01 brandon-b-miller

/ok to test

Jan 19 '23 01:01 brandon-b-miller

cudf
cudf copied to clipboard

Implement groupby apply with JIT

Description

Checklist

Codecov Report

cudf cudf copied to clipboard

Implement groupby apply with JIT

Description

Checklist

Codecov Report

cudf
cudf copied to clipboard