Tom Augspurger
Tom Augspurger
**Describe the bug** When creating a `plc.Table.from_arrow` on a pyarrow Table with a *sliced* `string_view` column, something seems to be off about the validity map: **Steps/Code to reproduce bug** ```python...
**Describe the bug** In string ops like `.str.starts_with` we incorrectly fill missing values with `False` instead of propagating the NA when using cudf-polars' streaming executor with multiple partitions. **Steps/Code to...
**Describe the bug** The test `python/cudf_polars/tests/test_groupby.py::test_groupby[no_maintain_order-col("key1")-col("uint16_with_null").sum()-col("uint16_with_null").mean().alias("mean")]` fails with a small blocksize. The issue seems to be related to how we aggregate missing values. **Steps/Code to reproduce bug** ```python import polars...
**Describe the bug** The test `python/cudf_polars/tests/test_join.py::test_non_coalesce_join[left-nulls_not_equal-join_expr0]` fails when using a small blocksize / multiple partitions. **Steps/Code to reproduce bug** Here's a simplified example ```python import polars as pl from cudf_polars.testing.asserts...
**Describe the bug** This snippet produces a result with different dtypes using the streaming executor, depending on whether there's more than one partition. **Steps/Code to reproduce bug** ```python import polars...
This adds a new test option ``--blocksize-mode`` to the test runner, which lets us easily exercise the multi-partition code paths of the streaming executor. When ``--blocksize-mode=small`` and ``--executor="streaming"`` the default...
This changes `CachingVisitor.state` to used a `TypedDict`. Previously, we used a `Mapping[str, Any]`, which had two problems: 1. Risks typos in the key names causing unexpected KeyErrors 2. The `Any`...
**Describe the bug** There's a strange interaction between `rmm.statistics` and cuDF's `spill=True` option, where the first time a `cudf.DataFrame` is initialized with this option set, the initial time `rmm.push_statistics(); rmm.pop_statistics()`...
This updates how we define the cuDF spilling metric and test. The primary motivation is to make it easier to test this within the main `pytest` process. Previously, the test...
**Describe the issue**: I get a `RuntimeError` with some dask-ml code that worked with dask / distributed 2024.10.0 and earlier. With 2024.11.0 and newer, it fails: **Minimal Complete Verifiable Example**:...