Tom Augspurger
Tom Augspurger
This was caught by failures in `python/cudf_polars/tests/expressions/test_numeric_unaryops.py` with a modified `max_rows_per_partition`. We should ensure that tests creating their own dataframes have a sufficient number of rows to hit the multi-partition...
@ilan-gold could you try to isolate the issue a bit? It'd be good to understand exactly what is slower: 1. dask with zarr-python 2.x vs. zarr-python 3.x (which I think...
Agreed this would need docs, but I'm generally +1 on using entry points rather than import-time side effects. > Couldn't that be really expensive if lots of packages were installed...
Yeah, [xarray](https://docs.xarray.dev/en/latest/internals/how-to-add-new-backend.html), [fsspec](https://filesystem-spec.readthedocs.io/en/latest/developer.html), [pytest](https://docs.pytest.org/en/latest/how-to/writing_plugins.html#pip-installable-plugins) are a few. Entry points can be a good option anytime you have som sort of plugin system that requires coordinating how a "framework" (pandas in...
Adding (versioned) examples sounds good. I normally don't like large binary files in git repos, but I imagine that these can be very small: just the metadata (which is really...
https://github.com/stac-utils/stac-geoparquet-data is created, if anyone wants to make a PR :)
This probably affects window operations too, e.g. those in `python/cudf_polars/tests/test_window_functions.py::test_rolling[agg_expr0-2d]`
The docstring even has an example further down ```python >>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c']) >>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f']) >>> crosstab(foo, bar) # 'c'...
It seems like the resolution from https://github.com/pandas-dev/pandas/issues/12298 was that all the categories should be present in the output. https://github.com/pandas-dev/pandas/pull/15511 seems to go against that... So I think this is a...