Tom Augspurger
Tom Augspurger
This is currently failing ``` tests/test_kmeans.py::test_check_estimator FAILED =================================================================================================================== FAILURES ==================================================================================================================== _____________________________________________________________________________________________________________ test_check_estimator ______________________________________________________________________________________________________________ def test_check_estimator(): with warnings.catch_warnings(record=True): warnings.simplefilter("ignore", RuntimeWarning) > check_estimator(DKKMeans()) tests/test_kmeans.py:28: _ _ _ _ _ _ _ _...
``` ❯ pytest -vs tests/test_incremental_pca.py::test_whitening[auto] (base) =============================================================================================================================================================================================== test session starts =============================================================================================================================================================================================== platform darwin -- Python 3.12.8, pytest-8.3.4, pluggy-1.5.0 -- /Users/toaugspurger/gh/dask/.direnv/python-3.12/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/Users/toaugspurger/gh/dask/dask-ml/.hypothesis/examples')) rootdir: /Users/toaugspurger/gh/dask/dask-ml configfile:...
We discussed memory usage on Friday's community call. https://github.com/TomAugspurger/zarr-python-memory-benchmark started to look at some stuff. https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/refs/heads/main/reports/memray-flamegraph-read-uncompressed.html has the memray flamegraph for reading an uncompressed array (400 MB total, split into...
### Zarr version v3 ### Numcodecs version na ### Python Version na ### Operating System na ### Installation na ### Description Currently, the `CodecPipeline` interface works by passing around `Iterable[tuple[...]]`...
This adds a pair of [json schema](https://json-schema.org/docs) schemas to the repository. One for Array metadata and one for Group metadata. For those unfamiliar with json-schema, it's a language for validating...
### Feature Type - [x] Adding new functionality to pandas - [ ] Changing existing functionality in pandas - [ ] Removing existing functionality in pandas ### Problem Description Over...
## Description This updates our config to require using the rapidsmpf CUDA Stream Pool with the rapidsmpf runtime. As we move pieces of the IR execution into native rapidsmpf IR...
## Description This updates the signature of our `IR.do_evaluate` nodes to follow our convention of being a classmethod with `*non_child_args, *children, *, context` It also adds a new pre-commit hook...
## Description rapidsmpf's `Shuffler` has two methods for inserting chunks into the shuffler: - [`insert_chunks`](https://docs.rapids.ai/api/rapidsmpf/nightly/api/#rapidsmpf.shuffler.Shuffler.insert_chunks) - [`concat_insert`](https://docs.rapids.ai/api/rapidsmpf/nightly/api/#rapidsmpf.shuffler.Shuffler.concat_insert) This adds a new option, `shuffler_insertion_method`, to the streaming executor config to control...
Spotted in https://github.com/rapidsai/cudf/pull/20662#discussion_r2578159439, rapidsmpf's native `read_parquet` node will produce data that's stream ordered on some CUDA stream from rapidsmpf's stream pool. It's not clear to me how this interacts with...