Results 181 issues of Tom Augspurger

This is currently failing ``` tests/test_kmeans.py::test_check_estimator FAILED =================================================================================================================== FAILURES ==================================================================================================================== _____________________________________________________________________________________________________________ test_check_estimator ______________________________________________________________________________________________________________ def test_check_estimator(): with warnings.catch_warnings(record=True): warnings.simplefilter("ignore", RuntimeWarning) > check_estimator(DKKMeans()) tests/test_kmeans.py:28: _ _ _ _ _ _ _ _...

``` ❯ pytest -vs tests/test_incremental_pca.py::test_whitening[auto] (base) =============================================================================================================================================================================================== test session starts =============================================================================================================================================================================================== platform darwin -- Python 3.12.8, pytest-8.3.4, pluggy-1.5.0 -- /Users/toaugspurger/gh/dask/.direnv/python-3.12/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/Users/toaugspurger/gh/dask/dask-ml/.hypothesis/examples')) rootdir: /Users/toaugspurger/gh/dask/dask-ml configfile:...

We discussed memory usage on Friday's community call. https://github.com/TomAugspurger/zarr-python-memory-benchmark started to look at some stuff. https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/refs/heads/main/reports/memray-flamegraph-read-uncompressed.html has the memray flamegraph for reading an uncompressed array (400 MB total, split into...

performance

### Zarr version v3 ### Numcodecs version na ### Python Version na ### Operating System na ### Installation na ### Description Currently, the `CodecPipeline` interface works by passing around `Iterable[tuple[...]]`...

enhancement

This adds a pair of [json schema](https://json-schema.org/docs) schemas to the repository. One for Array metadata and one for Group metadata. For those unfamiliar with json-schema, it's a language for validating...

### Feature Type - [x] Adding new functionality to pandas - [ ] Changing existing functionality in pandas - [ ] Removing existing functionality in pandas ### Problem Description Over...

Enhancement
Needs Triage

## Description This updates our config to require using the rapidsmpf CUDA Stream Pool with the rapidsmpf runtime. As we move pieces of the IR execution into native rapidsmpf IR...

bug
Python
breaking
cudf-polars

## Description This updates the signature of our `IR.do_evaluate` nodes to follow our convention of being a classmethod with `*non_child_args, *children, *, context` It also adds a new pre-commit hook...

bug
libcudf
Python
Java
non-breaking
cudf.pandas
cudf-polars

## Description rapidsmpf's `Shuffler` has two methods for inserting chunks into the shuffler: - [`insert_chunks`](https://docs.rapids.ai/api/rapidsmpf/nightly/api/#rapidsmpf.shuffler.Shuffler.insert_chunks) - [`concat_insert`](https://docs.rapids.ai/api/rapidsmpf/nightly/api/#rapidsmpf.shuffler.Shuffler.concat_insert) This adds a new option, `shuffler_insertion_method`, to the streaming executor config to control...

libcudf
Python
CMake
improvement
non-breaking
cudf-polars

Spotted in https://github.com/rapidsai/cudf/pull/20662#discussion_r2578159439, rapidsmpf's native `read_parquet` node will produce data that's stream ordered on some CUDA stream from rapidsmpf's stream pool. It's not clear to me how this interacts with...

cudf-polars