cudf
cudf copied to clipboard
Add an environment variable for handling fallback in cudf.pandas
Description
This PR wraps up #14975 and extends PR #15837. It adds a fallback debugging mode to _fast_slow_function_call that returns warnings for different types of fallback that occur in cudf.pandas. The types of fallback covered are:
- Out of memory errors, for the sake of planning No OOM related work
- AttributeErrors for missing functionality
- TypeErrors for differing function signatures
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [ ] The documentation is up to date with these changes.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
/ok to test
/ok to test
/ok to test
/ok to test
/ok to test
Looks pretty good. Could you show a small example of how this would look when using the run-pandas-tests.sh (maybe just run it on one test file with a few tests)?
/ok to test
/ok to test
Looks pretty good. Could you show a small example of how this would look when using the
run-pandas-tests.sh(maybe just run it on one test file with a few tests)?
I'm not seeing warnings where I expect them, which makes me think the environment variable is being set when the pandas tests are run. This is the command I'm using.
export CUDF_PANDAS_FALLBACK_DEBUGGING=True && python/cudf/cudf/pandas/scripts/run-pandas-tests.sh -n auto -v -p cudf.pandas tests/groupby/ | grep Warning
@Matt711 Are the warnings going to stderr? The pipe to grep will only capture stdout. You might need something like 2>&1 in there.
@Matt711 Are the warnings going to stderr? The pipe to grep will only capture stdout. You might need something like 2>&1 in there.
test_groupby_agg_no_extra_calls should definitely return NotImplemented warnings, but I don't see them in stdout. I don't think the test are being run with cudf.pandas despite -p cudf.pandas being passed.
(rapids) coder ➜ ~/cudf $ pytest -v -p cudf.pandas ./pandas-testing/pandas-tests/tests/groupby/aggregate/test_aggregate.py::test_groupby_agg_no_extra_calls 2>&1
============================================================================================================================================ test session starts ============================================================================================================================================
platform linux -- Python 3.10.14, pytest-7.4.4, pluggy-1.5.0 -- /home/coder/.conda/envs/rapids/bin/python3.10
cachedir: .pytest_cache
hypothesis profile 'ci' -> deadline=None, suppress_health_check=[HealthCheck.too_slow, HealthCheck.differing_executors], database=DirectoryBasedExampleDatabase(PosixPath('/home/coder/cudf/.hypothesis/examples'))
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/coder/cudf/pandas-testing/pandas-tests
configfile: pyproject.toml
plugins: anyio-4.4.0, hypothesis-6.103.2, benchmark-4.0.0, cases-3.8.5, cov-5.0.0, xdist-3.6.1
collected 1 item
pandas-testing/pandas-tests/tests/groupby/aggregate/test_aggregate.py::test_groupby_agg_no_extra_calls PASSED [100%]
============================================================================================================================================= 1 passed in 0.11s =============================================================================================================================================
Closing this PR in favor of #16161