[FEA] Improve Null-Aware Operator Support in AST-Codegen
Description
This pull-request follows-up #19831 and extends support for Null-aware filters and transforms. Addresses #20177
Changeset Summary
- Adds a new
null_outputpolicy to inform the transform to not generate null-masks, this is important fornull-awareoperators that are intended to actually discard null values.null_output::NON_NULLABLEwill not generate null-masks for the transform result - Adds
nullcontext torow_ir's nodes to generate code for null-specific contexts - Adds
is_null_awareandis_always_nonullableattributes to row_ir'snode's as an optimization to avoid generating nullmasks in the common case. - Adds a
may_evaluate_nullattribute to the kernel to signify that a null-mask is not needed for the transform operation - Minor refactoring
- [x] Fix null-propagation in filters when null values are selected, i.e., NULL_LOGICAL_AND operators (currently failing test). (Fixed in https://github.com/rapidsai/cudf/pull/20222)
- Tests for these changes
Story https://github.com/rapidsai/cudf/issues/18023
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
A couple of general comments (not necessarily need to be taken care of in this PR but would like them taken care of in a follow up PR)
- We should move enums
null_awareandnull_outputfromtypes.hpptotransform.hppso that they have a better scope. - Per my understanding,
null_awareshould instead be renamed asuse_input_nullabilityto indicate a function/operator can/will/may use the nullability info of its input col(s) to produce (maybe non-null) output vals - Similarly, per my understanding,
null_outputshould be renamed asoutput_nullabilitywith members being {PRESERVE_INPUT,NON_NULLABLE}
- Per my understanding, null_aware should instead be renamed as uses_input_nullability to indicate a function/operator can/will/may use the nullability info of its input col(s) to produce (maybe non-null) output vals
- Similarly, per my understanding, null_output should be renamed as output_nullability with members being {PRESERVE_INPUT, NON_NULLABLE}
+1 for both suggestions
@lamarrr Looks like the transform IsNull test is failing after the refactor https://github.com/rapidsai/cudf/actions/runs/19548836715/job/55976548396?pr=20206#step:11:827
/ok to test 4d6e088
Can you please re-submit your approvals @mhaseeb123 @bdice @shrshi
/ok to test
pr / conda-cpp-tests / 13.0.2, 3.13, arm64, ubuntu22.04, l4, latest-driver, latest-deps (push)
The cpp unit tests are frozen on Runner group name: 'nv-gpu-arm64-l4-1gpu'
PARQUET_TEST started but not passed. Runner might be stuck in this test.
@lamarrr What does fdd5458 actually do to fix the problem? Did we identify a root cause?
If there's some kind of locking behavior causing deadlocks or an issue with NVRTC, we need to get deeper, because that means this feature isn't stable enough to use yet. If it's just running out of GPU memory (the normal reason that we use GPUS 1 PERCENT 100 or similar), we should use the gtest_memory_usage.sh script to measure how much GPU memory each test is consuming at peak and share that data on this PR conversation.
@lamarrr What does https://github.com/rapidsai/cudf/commit/fdd5458e13f7dbb5192956adf03f7c36686cfb41 actually do to fix the problem? Did we identify a root cause?
@bdice I noticed LISTS_TEST was hanging when PARQUET_TEST was killed. I looked at the GPU percentage CMake config for these tests and observed they were 70-30, meaning they were likely running together.
This happens only on the latest driver; we'll be investigating further and submit a bug report once we create an MRE. But our initial impression is that it is due to an NVRTC bug, as the grafana logs show excessive peak RAM usage (>54 GB).
Readers of cpp/tests/CMakeLists.txt cannot tell that this problem exists, so the percent values used by those tests are now "magic". Please open a cuDF issue explaining what we know about the bug as of now, and include comments with links to that issue. Also please make the LISTS_TEST and PARQUET_TEST use 100% of the GPU to avoid any other possible overlaps that trigger the bug.
Does this manifest on x86 or only ARM? CUDA 12 or only 13? Is it hardware-specific? When you say "latest driver" does that mean only 580 or are 575, 550, etc. also affected? (Can we only rule out 535?) I am hesitant to merge something that introduces a known hang under load in general, especially before we have an MRE and can assess the causes more closely.
In the CI runs, it manifested only on arm64+L4 system. I am not able to replicate on GH200 system.
We have access to x86 + L4 already.
The combination that hung is pr / conda-cpp-tests / 13.0.2, 3.13, arm64, ubuntu22.04, l4, latest-driver, latest-deps (push)
12.9 and earliest-driver on arm64+L4 did not hang.
The job took 56+GB when it was hung. So, this could be NVRTC bug on newer driver, with cuda 13.0, for L4. We don't a repro on another system yet.
I've posted all the information I could gather about the failing tests in https://github.com/rapidsai/cudf/issues/20817
In addition, I have some ideas for further improving the bitmask computation but I'll defer to a later PR
I've asked for @lamarrr to split the segfault fix into a separate PR before we merge.
I've split the segfault fix into: https://github.com/rapidsai/cudf/pull/20841
/merge