cudf [FEA] Improve Null-Aware Operator Support in AST-Codegen

Description

This pull-request follows-up #19831 and extends support for Null-aware filters and transforms. Addresses #20177

Changeset Summary

Adds a new null_output policy to inform the transform to not generate null-masks, this is important for null-aware operators that are intended to actually discard null values. null_output::NON_NULLABLE will not generate null-masks for the transform result
Adds null context to row_ir's nodes to generate code for null-specific contexts
Adds is_null_aware and is_always_nonullable attributes to row_ir's node's as an optimization to avoid generating nullmasks in the common case.
Adds a may_evaluate_null attribute to the kernel to signify that a null-mask is not needed for the transform operation
Minor refactoring
[x] Fix null-propagation in filters when null values are selected, i.e., NULL_LOGICAL_AND operators (currently failing test). (Fixed in https://github.com/rapidsai/cudf/pull/20222)
Tests for these changes

Story https://github.com/rapidsai/cudf/issues/18023

Checklist

[x] I am familiar with the Contributing Guidelines.
[x] New or existing tests cover these changes.
[x] The documentation is up to date with these changes.

Oct 08 '25 02:10 lamarrr

A couple of general comments (not necessarily need to be taken care of in this PR but would like them taken care of in a follow up PR)

We should move enums null_aware and null_output from types.hpp to transform.hpp so that they have a better scope.
Per my understanding, null_aware should instead be renamed as use_input_nullability to indicate a function/operator can/will/may use the nullability info of its input col(s) to produce (maybe non-null) output vals
Similarly, per my understanding, null_output should be renamed as output_nullability with members being {PRESERVE_INPUT, NON_NULLABLE}

Nov 19 '25 19:11 mhaseeb123

Per my understanding, null_aware should instead be renamed as uses_input_nullability to indicate a function/operator can/will/may use the nullability info of its input col(s) to produce (maybe non-null) output vals

Similarly, per my understanding, null_output should be renamed as output_nullability with members being {PRESERVE_INPUT, NON_NULLABLE}

+1 for both suggestions

Nov 19 '25 19:11 shrshi

@lamarrr Looks like the transform IsNull test is failing after the refactor https://github.com/rapidsai/cudf/actions/runs/19548836715/job/55976548396?pr=20206#step:11:827

Nov 20 '25 20:11 shrshi

/ok to test 4d6e088

Nov 20 '25 22:11 mhaseeb123

Can you please re-submit your approvals @mhaseeb123 @bdice @shrshi

Nov 21 '25 01:11 lamarrr

/ok to test

Nov 21 '25 02:11 lamarrr

pr / conda-cpp-tests / 13.0.2, 3.13, arm64, ubuntu22.04, l4, latest-driver, latest-deps (push) The cpp unit tests are frozen on Runner group name: 'nv-gpu-arm64-l4-1gpu' PARQUET_TEST started but not passed. Runner might be stuck in this test.

Nov 21 '25 22:11 karthikeyann

@lamarrr What does fdd5458 actually do to fix the problem? Did we identify a root cause?

Nov 25 '25 19:11 bdice

If there's some kind of locking behavior causing deadlocks or an issue with NVRTC, we need to get deeper, because that means this feature isn't stable enough to use yet. If it's just running out of GPU memory (the normal reason that we use GPUS 1 PERCENT 100 or similar), we should use the gtest_memory_usage.sh script to measure how much GPU memory each test is consuming at peak and share that data on this PR conversation.

Nov 25 '25 19:11 bdice

@lamarrr What does https://github.com/rapidsai/cudf/commit/fdd5458e13f7dbb5192956adf03f7c36686cfb41 actually do to fix the problem? Did we identify a root cause?

@bdice I noticed LISTS_TEST was hanging when PARQUET_TEST was killed. I looked at the GPU percentage CMake config for these tests and observed they were 70-30, meaning they were likely running together. This happens only on the latest driver; we'll be investigating further and submit a bug report once we create an MRE. But our initial impression is that it is due to an NVRTC bug, as the grafana logs show excessive peak RAM usage (>54 GB).

Nov 25 '25 20:11 lamarrr

Readers of cpp/tests/CMakeLists.txt cannot tell that this problem exists, so the percent values used by those tests are now "magic". Please open a cuDF issue explaining what we know about the bug as of now, and include comments with links to that issue. Also please make the LISTS_TEST and PARQUET_TEST use 100% of the GPU to avoid any other possible overlaps that trigger the bug.

Does this manifest on x86 or only ARM? CUDA 12 or only 13? Is it hardware-specific? When you say "latest driver" does that mean only 580 or are 575, 550, etc. also affected? (Can we only rule out 535?) I am hesitant to merge something that introduces a known hang under load in general, especially before we have an MRE and can assess the causes more closely.

Nov 25 '25 20:11 bdice

In the CI runs, it manifested only on arm64+L4 system. I am not able to replicate on GH200 system. We have access to x86 + L4 already. The combination that hung is pr / conda-cpp-tests / 13.0.2, 3.13, arm64, ubuntu22.04, l4, latest-driver, latest-deps (push) 12.9 and earliest-driver on arm64+L4 did not hang.

The job took 56+GB when it was hung. So, this could be NVRTC bug on newer driver, with cuda 13.0, for L4. We don't a repro on another system yet.

Nov 25 '25 21:11 karthikeyann

I've posted all the information I could gather about the failing tests in https://github.com/rapidsai/cudf/issues/20817

In addition, I have some ideas for further improving the bitmask computation but I'll defer to a later PR

Dec 09 '25 20:12 lamarrr

I've asked for @lamarrr to split the segfault fix into a separate PR before we merge.

Dec 11 '25 14:12 bdice

I've split the segfault fix into: https://github.com/rapidsai/cudf/pull/20841

Dec 11 '25 15:12 lamarrr

/merge

Dec 13 '25 04:12 lamarrr