Tom Augspurger issues

Results 181 issues of


                                            Tom Augspurger

Failure in `check_n_features_in_after_fitting` in `tests/test_kmeans.py::test_check_estimator`

This is currently failing ``` tests/test_kmeans.py::test_check_estimator FAILED =================================================================================================================== FAILURES ==================================================================================================================== _____________________________________________________________________________________________________________ test_check_estimator ______________________________________________________________________________________________________________ def test_check_estimator(): with warnings.catch_warnings(record=True): warnings.simplefilter("ignore", RuntimeWarning) > check_estimator(DKKMeans()) tests/test_kmeans.py:28: _ _ _ _ _ _ _ _...

`tests/test_incremental_pca.py::test_whitening[auto]` failing

``` ❯ pytest -vs tests/test_incremental_pca.py::test_whitening[auto] (base) =============================================================================================================================================================================================== test session starts =============================================================================================================================================================================================== platform darwin -- Python 3.12.8, pytest-8.3.4, pluggy-1.5.0 -- /Users/toaugspurger/gh/dask/.direnv/python-3.12/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/Users/toaugspurger/gh/dask/dask-ml/.hypothesis/examples')) rootdir: /Users/toaugspurger/gh/dask/dask-ml configfile:...

Codec pipeline memory usage

We discussed memory usage on Friday's community call. https://github.com/TomAugspurger/zarr-python-memory-benchmark started to look at some stuff. https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/refs/heads/main/reports/memray-flamegraph-read-uncompressed.html has the memray flamegraph for reading an uncompressed array (400 MB total, split into...

performance

Refactor CodecPipeline for flexibility

### Zarr version v3 ### Numcodecs version na ### Python Version na ### Operating System na ### Installation na ### Description Currently, the `CodecPipeline` interface works by passing around `Iterable[tuple[...]]`...

enhancement

Added json schema

This adds a pair of [json schema](https://json-schema.org/docs) schemas to the repository. One for Array metadata and one for Group metadata. For those unfamiliar with json-schema, it's a language for validating...

ENH: Add `unit` argument to `to_datetime` and `to_timedelta` to avoid value-dependent parsing

### Feature Type - [x] Adding new functionality to pandas - [ ] Changing existing functionality in pandas - [ ] Removing existing functionality in pandas ### Problem Description Over...

Enhancement

Needs Triage

Require rapidsmpf Pool with the rapidsmpf runtime

## Description This updates our config to require using the rapidsmpf CUDA Stream Pool with the rapidsmpf runtime. As we move pieces of the IR execution into native rapidsmpf IR...

bug

Python

breaking

cudf-polars

Ensure IR.do_evaluate signature is correct

## Description This updates the signature of our `IR.do_evaluate` nodes to follow our convention of being a classmethod with `*non_child_args, *children, *, context` It also adds a new pre-commit hook...

bug

libcudf

Python

Java

non-breaking

cudf.pandas

cudf-polars

Add cudf-polars option to control rapidsmpf Shuffle insertion method

## Description rapidsmpf's `Shuffler` has two methods for inserting chunks into the shuffler: - [`insert_chunks`](https://docs.rapids.ai/api/rapidsmpf/nightly/api/#rapidsmpf.shuffler.Shuffler.insert_chunks) - [`concat_insert`](https://docs.rapids.ai/api/rapidsmpf/nightly/api/#rapidsmpf.shuffler.Shuffler.concat_insert) This adds a new option, `shuffler_insertion_method`, to the streaming executor config to control...

libcudf

Python

CMake

improvement

non-breaking

cudf-polars

Require using rapidsmpf Stream Pool with rapidsmpf runtime

Spotted in https://github.com/rapidsai/cudf/pull/20662#discussion_r2578159439, rapidsmpf's native `read_parquet` node will produce data that's stream ordered on some CUDA stream from rapidsmpf's stream pool. It's not clear to me how this interacts with...

cudf-polars