pandas2 icon indicating copy to clipboard operation
pandas2 copied to clipboard

Design documents and code for the pandas 2.0 effort.

Results 58 pandas2 issues
Sort by recently updated
recently updated
newest added

xref #15 I brought this up at SciPy 2015, but there's a significant performance win available in expressions like: ``` df[boolean_cond].groupby(grouping_exprs).agg(agg_expr) ``` If you do this currently, it will produce...

performance
memory-use

pandas's row indexes introduces a level of semantic incompatibility with other systems that occasionally causes problems for users who are using both pandas and some other system. Functionally, this mainly...

API

potentially related to #9 numpy ufuncs have an identity, which `pandas` follows with respect to misssing data. ``` np.sum([], dtype='float64') Out[33]: 0.0 np.nansum([np.nan], dtype='float64') Out[35]: 0.0 pd.Series([np.nan]).sum() Out[36]: 0.0 ```...

missing data

this may not actually be an issue as we aren't using float `np.nan` as our missing marker, but we tend to have some subtle issues when int64 are downcast to...

dtypes

pandas `dtype` coercion is useful in most cases, but there are situations to prohibit it to avoid unexpected values being included. The behavior should be switchable like: ``` s =...

dtypes

Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making...

indexing

The rules for exactly what `DataFrame.__getitem__`/`__setitem__` does (pydata/pandas#9595) are sufficiently complex and inconsistent that they are impossible to understand without extensive experimentation. This makes for a rather embarrassing situation that...

indexing

Similar to the `ARRAY` type found in SQL variants with nested types. See also the `List` type in Apache Arrow. xref pydata/pandas#8517

dtypes

`.lookup` https://github.com/pydata/pandas/issues/7138, for coordinate access is useful, but is not incorporated in a generalized indexer.

indexing

Copying my comment from https://github.com/pydata/pandas/issues/10000#issuecomment-236238297: We should consider making arithmetic between a Series and a DataFrame broadcast across the columns of the dataframe, i.e., aligning `series.index` with `df.index`, rather than...

indexing