pandas2
pandas2 copied to clipboard
Design documents and code for the pandas 2.0 effort.
xref #15 I brought this up at SciPy 2015, but there's a significant performance win available in expressions like: ``` df[boolean_cond].groupby(grouping_exprs).agg(agg_expr) ``` If you do this currently, it will produce...
pandas's row indexes introduces a level of semantic incompatibility with other systems that occasionally causes problems for users who are using both pandas and some other system. Functionally, this mainly...
potentially related to #9 numpy ufuncs have an identity, which `pandas` follows with respect to misssing data. ``` np.sum([], dtype='float64') Out[33]: 0.0 np.nansum([np.nan], dtype='float64') Out[35]: 0.0 pd.Series([np.nan]).sum() Out[36]: 0.0 ```...
this may not actually be an issue as we aren't using float `np.nan` as our missing marker, but we tend to have some subtle issues when int64 are downcast to...
pandas `dtype` coercion is useful in most cases, but there are situations to prohibit it to avoid unexpected values being included. The behavior should be switchable like: ``` s =...
Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making...
The rules for exactly what `DataFrame.__getitem__`/`__setitem__` does (pydata/pandas#9595) are sufficiently complex and inconsistent that they are impossible to understand without extensive experimentation. This makes for a rather embarrassing situation that...
Similar to the `ARRAY` type found in SQL variants with nested types. See also the `List` type in Apache Arrow. xref pydata/pandas#8517
`.lookup` https://github.com/pydata/pandas/issues/7138, for coordinate access is useful, but is not incorporated in a generalized indexer.
Copying my comment from https://github.com/pydata/pandas/issues/10000#issuecomment-236238297: We should consider making arithmetic between a Series and a DataFrame broadcast across the columns of the dataframe, i.e., aligning `series.index` with `df.index`, rather than...