tpch icon indicating copy to clipboard operation
tpch copied to clipboard

Add support for running pandas queries with cudf.pandas enabled

Open vyasr opened this issue 8 months ago • 2 comments

This PR makes it possible to run the pandas queries with GPU acceleration analogous to the support for the Polars GPU engine. To support this, cudf is added to the requirements list (which means we should also be able to run the Polars GPU engine benchmarks with the virtual environment now).

vyasr avatar Apr 23 '25 21:04 vyasr

This needs a rebase.

ritchie46 avatar Apr 26 '25 18:04 ritchie46

Done. However, the last release of cudf has an upper bound on the supported version that bumps us back to 1.25 here. I don't know if that is compatible with the polars cloud bits that you recently added. If you prefer, I can revert the changes adding cudf to the environment and we can continue relying on the *-no-env variants of the Makefile targets for now for the GPU benchmarks, and revisit adding cudf to the environment at a later date.

Note that when we first added GPU benchmarks to this repo cudf was not yet available on PyPI, only NVIDIA's pip index, so there was an even stronger reason not to add it to the environment here. Now that we can get cudf from PyPI it is feasible to do this, with the main issue being if the upper bounds that we impose for stability reasons are prohibitive for your use cases in this repo. Ideally we'd be able to relax that bound eventually, but I don't think we're quite comfortable enough to do that yet.

vyasr avatar Apr 28 '25 19:04 vyasr

The 25.06 release of cudf will support Polars 1.28, so perhaps the best option here is to wait for that release so that we don't have to change the supported Polars version here.

vyasr avatar May 24 '25 01:05 vyasr

@ritchie46 The open question on this PR is that since cudf-polars currently place an upper bound on polars, if cudf-polars is part of the environment then it upper bounds the version of polars we can have installed until the next release. If that is OK, I can update this PR with latest main. Otherwise I can simplify this PR by removing the requirements changes and the run-pandas-gpu target (I'll just leave the *-no-env targets).

vyasr avatar Sep 15 '25 15:09 vyasr

Otherwise I can simplify this PR by removing the requirements changes and the run-pandas-gpu target (I'll just leave the *-no-env targets).

I think I'd prefer that. Can you also rebase? I think that should satisfy mypi.

ritchie46 avatar Sep 16 '25 06:09 ritchie46

I think I'd prefer that. Can you also rebase? I think that should satisfy mypi.

Both done. Unfortunately still seeing mypy errors. I opened https://github.com/pola-rs/polars-benchmark/pull/173 to resolve the outstanding issues with CI.

vyasr avatar Sep 18 '25 01:09 vyasr