cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Implement groupby apply with JIT

Open bwyogatama opened this issue 2 years ago • 6 comments

Description

Experimental cuDF Groupby Apply JIT pipeline.

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [] The documentation is up to date with these changes.

bwyogatama avatar Aug 03 '22 19:08 bwyogatama

@bwyogatama We should move the function.cu file to a place like cudf/cpp/src/udf/kernels.cu (just the first thing came into my mind, feel free to suggest the proper place) to make this PR exposed to cpp reviewers.

This CUDA source is required to be compiled to a ptx file in order to be used by the python code. Not sure what would be the best way to do this in cudf. @robertmaynard Any pointers are highly appreciated.

PointKernel avatar Aug 03 '22 20:08 PointKernel

If we want to compile to PTX I think the cleanest CMake solution is to create an object library with just this file and then set CUDA_PTX_COMPILATION.

Also we should probably discuss reusing atomics etc from libcudf, although I'm not yet sure how we want that to look since I don't think we want to make those part of the libcudf public API.

vyasr avatar Aug 04 '22 16:08 vyasr

If we want to compile to PTX I think the cleanest CMake solution is to create an object library with just this file and then set CUDA_PTX_COMPILATION.

Also we should probably discuss reusing atomics etc from libcudf, although I'm not yet sure how we want that to look since I don't think we want to make those part of the libcudf public API.

This is correct, but you will need to iterate over the values of CMAKE_CUDA_ARCHITECTURES and create a new object library for each value since PTX compilation only supports a single arch.

set(ptx_src "src/a.cu")
foreach(arch IN LISTS CMAKE_CUDA_ARCHITECTURES)
    add_library(ptx_example_${arch} OBJECT ${ptx_src})
    set_target_properties(ptx_example_${arch}
        PROPERTIES CUDA_ARCHITECTURES ${arch}
                   CUDA_PTX_COMPILATION ON
       )
endforeach()

We need to update the install rules to also ship the ptx output

robertmaynard avatar Aug 04 '22 18:08 robertmaynard

We should move the function.cu file to a place like cudf/cpp/src/udf/kernels.cu (just the first thing came into my mind, feel free to suggest the proper place) to make this PR exposed to cpp reviewers.

We went back and forth on this. We considered leaving it somewhere in the python area since it serves only the python API. I think it's a good idea that it is owned by the c++ code owners though.

We need to update the install rules to also ship the ptx output

We're going to need these PTX files to come as part of the conda packages as well. Both of these issues need to be solved in https://github.com/rapidsai/cudf/pull/11319 as well which relies on the same pattern, so if we can solve them here that's great! :)

brandon-b-miller avatar Aug 05 '22 14:08 brandon-b-miller

This is super cool to see! Can you also post one of the early benchmarks plot comparing performance?

quasiben avatar Aug 08 '22 14:08 quasiben

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Sep 18 '22 19:09 github-actions[bot]

@bwyogatama @brandon-b-miller I've updated the build system here to get the JIT code path to compile as part of the library. I've also merged in the latest branch-22.10 and made some minor fixes to the testing code so that it'll give us useful diagnostics for now. There are small differences between cudf and pandas groupby that cause some issues, so I'm just working around them for now so that we can focus on having testing that verifies that the JIT code path works about as expected.

vyasr avatar Sep 24 '22 00:09 vyasr

Codecov Report

Base: 86.58% // Head: 85.83% // Decreases project coverage by -0.75% :warning:

Coverage data is based on head (eaa8ff7) compared to base (b6dccb3). Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.02   #11452      +/-   ##
================================================
- Coverage         86.58%   85.83%   -0.75%     
================================================
  Files               155      158       +3     
  Lines             24368    25164     +796     
================================================
+ Hits              21098    21599     +501     
- Misses             3270     3565     +295     
Impacted Files Coverage Δ
python/cudf/cudf/_version.py 1.41% <0.00%> (-98.59%) :arrow_down:
python/cudf/cudf/core/buffer/spill_manager.py 72.50% <0.00%> (-7.50%) :arrow_down:
python/cudf/cudf/core/udf/utils.py 93.75% <0.00%> (-5.35%) :arrow_down:
python/strings_udf/strings_udf/__init__.py 66.66% <0.00%> (-2.69%) :arrow_down:
python/cudf/cudf/utils/dtypes.py 77.85% <0.00%> (-1.61%) :arrow_down:
python/cudf/cudf/options.py 86.11% <0.00%> (-1.59%) :arrow_down:
python/cudf/cudf/core/single_column_frame.py 94.30% <0.00%> (-1.27%) :arrow_down:
python/cudf/cudf/io/json.py 91.04% <0.00%> (-1.02%) :arrow_down:
...ython/custreamz/custreamz/tests/test_dataframes.py 98.38% <0.00%> (-1.01%) :arrow_down:
python/dask_cudf/dask_cudf/io/csv.py 96.34% <0.00%> (-1.00%) :arrow_down:
... and 57 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Sep 24 '22 02:09 codecov[bot]

@brandon-b-miller the tests now pass. The outstanding build failures are happening because some of the build steps happen on systems without GPUs. I ported over the logic I wrote for strings_udf to select a PTX file for the appropriate architecture, and that code includes a numba call to get the current device. In strings_udf that's inside a try-except, so I'm assuming that on these CI executors we just set strings_udf.ENABLED = False and otherwise continue happily. We need to be able to import cudf on a system without the CUDA driver available, so we need to do something similar in this PR where the groupby_function module executes the driver version check conditionally, only when a CUDA driver is enabled. I'll let you and @bwyogatama decide exactly what that should look like in terms of the structure of the code as you reorganize.

vyasr avatar Sep 24 '22 15:09 vyasr

Thanks @vyasr for contributing to this feature. I will coordinate with @brandon-b-miller and you starting next week to get in the loop and to start the draft moving again.

bwyogatama avatar Sep 24 '22 20:09 bwyogatama

Firstly, thanks to @bwyogatama for this incredible PR, and thanks to @vyasr again for working through the PTX building process. I have a couple of general thoughts for moving this forward:

  1. If we can make c++ changes that solve the issue of programmatically generating functions for all our types, I think we should pursue that - but I don't think it's worth working too hard to optimizing the c++ for performance at this stage given how fast this is already relative to vanilla Groupby.apply. To that end I suggest focusing first on making the python robust. I believe this will lead to a useable feature the fastest.
  2. In strings_udf, we punted on the problem of not putting this cu file into libcudf by putting it completely into it's own package and building it separately. We clearly can't do this for every PTX file we choose to use in this way. Since libcudf likely wont accept this code, the only place I can think of putting it is a cudf/cpp directory. This maybe could set a good precedent for projects following this pattern going forward.
  3. Lastly, I'd like to reorganize some of this code a bit. Typing and cuda declarations should go in groupby_typing.py. Lowering in groupby_lowering.py. It makes it a little easier to know where things are, and where to import them from.

Excited to iterate on this!

brandon-b-miller avatar Oct 19 '22 20:10 brandon-b-miller

@brandon-b-miller @vyasr I refactored the C++ files following @bdice suggestion. I will push more changes on the python side this weekend.

bwyogatama avatar Oct 28 '22 17:10 bwyogatama

@bwyogatama Can you please update the base branch to branch-22.12 and resolve conflicts to unblock CI tests?

PointKernel avatar Nov 02 '22 15:11 PointKernel

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

GPUtester avatar Nov 02 '22 15:11 GPUtester

ok to test

PointKernel avatar Nov 02 '22 15:11 PointKernel

@bwyogatama Can you please update the base branch to branch-22.12 and resolve conflicts to unblock CI tests?

I think I have updated the base branch to branch-22.12

bwyogatama avatar Nov 02 '22 23:11 bwyogatama

Hi @bwyogatama !

I created a PR into your branch that should clean things up and put us in a position to start moving forward:

https://github.com/bwyogatama/cudf/pull/1

See the comments on that PR for the exact set of changes - the diff looks big for now but if you merge that PR and retarget this PR to 23.02, you should see the diff collapse again. If this works I can open a few follow up PR's to help out with the python here.

brandon-b-miller avatar Nov 21 '22 18:11 brandon-b-miller

Hi @bwyogatama !

I created a PR into your branch that should clean things up and put us in a position to start moving forward:

bwyogatama#1

See the comments on that PR for the exact set of changes - the diff looks big for now but if you merge that PR and retarget this PR to 23.02, you should see the diff collapse again. If this works I can open a few follow up PR's to help out with the python here.

Thanks @brandon-b-miller for the help! I will check on this and get back to you!

bwyogatama avatar Nov 22 '22 01:11 bwyogatama

I figured out how to push changes directly here so no need for the other PR! :)

brandon-b-miller avatar Dec 13 '22 16:12 brandon-b-miller

FWIW, it seems like one of the merges (or possibly retargeting to a new branch) has marked as outdated many comments which I think are not (so in the diff view we don't see them).

wence- avatar Dec 13 '22 16:12 wence-

rerun tests

brandon-b-miller avatar Jan 06 '23 17:01 brandon-b-miller

/ok to test

brandon-b-miller avatar Jan 06 '23 17:01 brandon-b-miller

/ok to test

brandon-b-miller avatar Jan 06 '23 17:01 brandon-b-miller

/ok to test

brandon-b-miller avatar Jan 08 '23 16:01 brandon-b-miller

/ok to test

brandon-b-miller avatar Jan 08 '23 18:01 brandon-b-miller

@bwyogatama I've isolated the idxmax failure to be only related to the float64 dtype:

FAILED python/cudf/cudf/tests/test_groupby.py::test_groupby_apply_jit_reductions[idxmax-dtype1]

brandon-b-miller avatar Jan 09 '23 19:01 brandon-b-miller

/ok to test

brandon-b-miller avatar Jan 09 '23 19:01 brandon-b-miller

thanks for your thorough review @wence- I think I made a pass on everything you mentioned so far in the python layer.

brandon-b-miller avatar Jan 17 '23 19:01 brandon-b-miller

/ok to test

brandon-b-miller avatar Jan 17 '23 19:01 brandon-b-miller

/ok to test

brandon-b-miller avatar Jan 19 '23 01:01 brandon-b-miller