Tom Augspurger issues

Results 181 issues of


                                            Tom Augspurger

[BUG]: Incorrect missing values in pylibcudf.Table.from_arrow on a sliced, string_view column

**Describe the bug** When creating a `plc.Table.from_arrow` on a pyarrow Table with a *sliced* `string_view` column, something seems to be off about the validity map: **Steps/Code to reproduce bug** ```python...

bug

pylibcudf

[BUG]: NA values incorrectly filled with `False` in String ops with streaming executor and multiple partitions

**Describe the bug** In string ops like `.str.starts_with` we incorrectly fill missing values with `False` instead of propagating the NA when using cudf-polars' streaming executor with multiple partitions. **Steps/Code to...

bug

[BUG]: Incorrect result `group_by().mean()` with experimental streaming executor, multiple partitions, and missing values

**Describe the bug** The test `python/cudf_polars/tests/test_groupby.py::test_groupby[no_maintain_order-col("key1")-col("uint16_with_null").sum()-col("uint16_with_null").mean().alias("mean")]` fails with a small blocksize. The issue seems to be related to how we aggregate missing values. **Steps/Code to reproduce bug** ```python import polars...

bug

cudf.polars

[BUG]: Incorrect result in non-coalesce join with experimental streaming executor and multiple partitions

**Describe the bug** The test `python/cudf_polars/tests/test_join.py::test_non_coalesce_join[left-nulls_not_equal-join_expr0]` fails when using a small blocksize / multiple partitions. **Steps/Code to reproduce bug** Here's a simplified example ```python import polars as pl from cudf_polars.testing.asserts...

bug

cudf.polars

[BUG]: Dtype mismatch between partitioned and non-partitioned aggregation with experimental streaming executor in some aggregations

**Describe the bug** This snippet produces a result with different dtypes using the streaming executor, depending on whether there's more than one partition. **Steps/Code to reproduce bug** ```python import polars...

bug

cudf.polars

Configurable blocksize mode for streaming executor in unit tests

This adds a new test option ``--blocksize-mode`` to the test runner, which lets us easily exercise the multi-partition code paths of the streaming executor. When ``--blocksize-mode=small`` and ``--executor="streaming"`` the default...

tests

Python

non-breaking

cudf.polars

Used TypeDict for CachingVisitor.state

This changes `CachingVisitor.state` to used a `TypedDict`. Previously, we used a `Mapping[str, Any]`, which had two problems: 1. Risks typos in the key names causing unexpected KeyErrors 2. The `Any`...

Python

improvement

non-breaking

cudf.polars

regression

Tom Augspurger

[BUG]: Incorrect missing values in pylibcudf.Table.from_arrow on a sliced, string_view column

[BUG]: NA values incorrectly filled with `False` in String ops with streaming executor and multiple partitions

[BUG]: Incorrect result `group_by().mean()` with experimental streaming executor, multiple partitions, and missing values

[BUG]: Incorrect result in non-coalesce join with experimental streaming executor and multiple partitions

[BUG]: Dtype mismatch between partitioned and non-partitioned aggregation with experimental streaming executor in some aggregations

Configurable blocksize mode for streaming executor in unit tests

Used TypeDict for CachingVisitor.state

[BUG]: Strange interaction between cuDF spilling and rmm.statistics

Restructure cudf spill metrics and test

`RuntimeError: Not enough arguments provided: missing keys` in `dask.persist` with mix of `Future` and `Delayed`