cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[FEA] Support more dtypes in JIT GroupBy `apply`

Open brandon-b-miller opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe. When https://github.com/rapidsai/cudf/pull/11452 lands, we'll get JIT Groupby.apply for a subset of UDFs and importantly, dtypes. However over the summer we only got as far as writing overloads for float64 and int64 dtypes in the users source data. It'd be nice if we could support more dtypes, starting at least with the rest of the numeric types.

Describe the solution you'd like Extend the existing groupby.apply, engine='jit' framework to support the following additional dtypes:

  • float32
  • int32
  • int16
  • int8
  • uint64
  • uint32
  • uint16
  • uint8
  • bool

A lot of the machinery in the original PR is fairly general and should make adding many of these easy- however there will undoubtedly be edge cases. As such it makes for a pretty good first issue for anyone jumping into the numba extension piece of the codebase.

brandon-b-miller avatar Jan 25 '23 15:01 brandon-b-miller

Hi, brandon. I am interested in this part in cuDF, so I can give this a try in a few weeks.

dlee992 avatar Feb 06 '23 11:02 dlee992

Hi @dlee992 , thanks so much for your interest in taking this on! I thought I'd share a little about how I'd approach this just to get you off the ground.

The main tests I recommend trying to pass first would be these ones which test simple reductions. This test is parameterized based on a list of supported types we hardcode in groupby_typing. I tried adding int32 locally and it looks like a lot works already when running the test:

error   : Undefined reference to 'BlockMin_int32' in '<cudapy-ptx>'

This hopefully means that adding a couple of lines around here in the c++ library could pass a bunch of tests right off the bat if we're lucky :)

Happy to help at any time on the RAPIDS goAI Slack by the way, just ping me with any questions. Good luck!

brandon-b-miller avatar Feb 08 '23 20:02 brandon-b-miller

Thanks for your kindful comments! I have a little experience in Numba code, but I'm a newbie in cuDF and GPU testing. Anyway, since this is an interesting issue I think, I will give it a try as my side project.

In fact, I am now trying to get some GPU devices for local testing usage.

======

Updates: Got several GPU cards in an internal cluster.

dlee992 avatar Feb 09 '23 02:02 dlee992

@brandon-b-miller is this issue still valid? I see that this works now, for instance:

In [96]: cudf.DataFrame({
    ...:     "a": cudf.Series([1, 1, 2, 2, 3, 3], dtype="float32"),
    ...:     "b": cudf.Series([2, 4, 0, 12, 30, 20], dtype="float32"),
    ...: }).groupby("a").apply(
    ...:     lambda gp: gp['b'].sum(),
    ...:     engine="jit",
    ...: )
Out[96]:
a
1.0     6.0
2.0    12.0
3.0    50.0
dtype: float32

I'm going to close as resolved for the moment, but please reopen if I missed something.

Note that I'm only concerned with the numeric types mentioned above; I don't think this issue is the right place to track e.g. string or struct UDFs.

vyasr avatar May 17 '24 16:05 vyasr

I believe we only have definitions for 4 and 8 byte int and float types for now. It'd be useful if we had implementations for int8, int16 types, etc. But IIRC the problem was missing features on the cuda side. It might have been a lack of atomics for those widths.

https://github.com/rapidsai/cudf/blob/branch-24.06/python/cudf/udf_cpp/shim.cu#L652

brandon-b-miller avatar May 17 '24 16:05 brandon-b-miller

Ah OK. Perhaps we should keep this issue open for the moment, then.

vyasr avatar May 17 '24 18:05 vyasr