cudf [FEA] Support more dtypes in JIT GroupBy `apply`

Is your feature request related to a problem? Please describe. When https://github.com/rapidsai/cudf/pull/11452 lands, we'll get JIT Groupby.apply for a subset of UDFs and importantly, dtypes. However over the summer we only got as far as writing overloads for float64 and int64 dtypes in the users source data. It'd be nice if we could support more dtypes, starting at least with the rest of the numeric types.

Describe the solution you'd like Extend the existing groupby.apply, engine='jit' framework to support the following additional dtypes:

float32
int32
int16
int8
uint64
uint32
uint16
uint8
bool

A lot of the machinery in the original PR is fairly general and should make adding many of these easy- however there will undoubtedly be edge cases. As such it makes for a pretty good first issue for anyone jumping into the numba extension piece of the codebase.

Jan 25 '23 15:01 brandon-b-miller

Hi, brandon. I am interested in this part in cuDF, so I can give this a try in a few weeks.

Feb 06 '23 11:02 dlee992

Hi @dlee992 , thanks so much for your interest in taking this on! I thought I'd share a little about how I'd approach this just to get you off the ground.

The main tests I recommend trying to pass first would be these ones which test simple reductions. This test is parameterized based on a list of supported types we hardcode in groupby_typing. I tried adding int32 locally and it looks like a lot works already when running the test:

error   : Undefined reference to 'BlockMin_int32' in '<cudapy-ptx>'

This hopefully means that adding a couple of lines around here in the c++ library could pass a bunch of tests right off the bat if we're lucky :)

Happy to help at any time on the RAPIDS goAI Slack by the way, just ping me with any questions. Good luck!

Feb 08 '23 20:02 brandon-b-miller

Thanks for your kindful comments! I have a little experience in Numba code, but I'm a newbie in cuDF and GPU testing. Anyway, since this is an interesting issue I think, I will give it a try as my side project.

In fact, I am now trying to get some GPU devices for local testing usage.

======

Updates: Got several GPU cards in an internal cluster.

Feb 09 '23 02:02 dlee992

@brandon-b-miller is this issue still valid? I see that this works now, for instance:

In [96]: cudf.DataFrame({
    ...:     "a": cudf.Series([1, 1, 2, 2, 3, 3], dtype="float32"),
    ...:     "b": cudf.Series([2, 4, 0, 12, 30, 20], dtype="float32"),
    ...: }).groupby("a").apply(
    ...:     lambda gp: gp['b'].sum(),
    ...:     engine="jit",
    ...: )
Out[96]:
a
1.0     6.0
2.0    12.0
3.0    50.0
dtype: float32

I'm going to close as resolved for the moment, but please reopen if I missed something.

Note that I'm only concerned with the numeric types mentioned above; I don't think this issue is the right place to track e.g. string or struct UDFs.

May 17 '24 16:05 vyasr

I believe we only have definitions for 4 and 8 byte int and float types for now. It'd be useful if we had implementations for int8, int16 types, etc. But IIRC the problem was missing features on the cuda side. It might have been a lack of atomics for those widths.

https://github.com/rapidsai/cudf/blob/branch-24.06/python/cudf/udf_cpp/shim.cu#L652

May 17 '24 16:05 brandon-b-miller

Ah OK. Perhaps we should keep this issue open for the moment, then.

May 17 '24 18:05 vyasr