Tom Augspurger comments

Results 1078 comments of


                                            Tom Augspurger

[BUG]: Duplicate values in unary ops with streaming executor and multiple partitions

This was caught by failures in `python/cudf_polars/tests/expressions/test_numeric_unaryops.py` with a modified `max_rows_per_partition`. We should ensure that tests creating their own dataframes have a sufficient number of rows to hit the multi-partition...

dask with zarr as file i/o slower in v3 vs v2 pakcage

@ilan-gold could you try to isolate the issue a bit? It'd be good to understand exactly what is slower: 1. dask with zarr-python 2.x vs. zarr-python 3.x (which I think...

ENH: Support Plugin Accessors Via Entry Points

Agreed this would need docs, but I'm generally +1 on using entry points rather than import-time side effects. > Couldn't that be really expensive if lots of packages were installed...

ENH: Support Plugin Accessors Via Entry Points

Yeah, [xarray](https://docs.xarray.dev/en/latest/internals/how-to-add-new-backend.html), [fsspec](https://filesystem-spec.readthedocs.io/en/latest/developer.html), [pytest](https://docs.pytest.org/en/latest/how-to/writing_plugins.html#pip-installable-plugins) are a few. Entry points can be a good option anytime you have som sort of plugin system that requires coordinating how a "framework" (pandas in...

Consolidate test data someplace with clear geoparquet versioning

Adding (versioned) examples sounds good. I normally don't like large binary files in git repos, but I imagine that these can be very small: just the metadata (which is really...

Consolidate test data someplace with clear geoparquet versioning

https://github.com/stac-utils/stac-geoparquet-data is created, if anyone wants to make a PR :)

Fixed group_by mean with missing values and multiple partitions

/merge

[BUG]: Incorrect result for `rolling` with experimental streaming executor and multiple partitions

This probably affects window operations too, e.g. those in `python/cudf_polars/tests/test_window_functions.py::test_rolling[agg_expr0-2d]`

pd.crosstab, categorical data and missing instances

The docstring even has an example further down ```python >>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c']) >>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f']) >>> crosstab(foo, bar) # 'c'...

pd.crosstab, categorical data and missing instances

It seems like the resolution from https://github.com/pandas-dev/pandas/issues/12298 was that all the categories should be present in the output. https://github.com/pandas-dev/pandas/pull/15511 seems to go against that... So I think this is a...