pandas2 issues

"Predicate pushdown" in group-bys

2

xref #15 I brought this up at SciPy 2015, but there's a significant performance win available in expressions like: ``` df[boolean_cond].groupby(grouping_exprs).agg(agg_expr) ``` If you do this currently, it will produce...

wesm

performance

memory-use

Alternate groupby API that is more functionally consistent with databases or systems like dplyr

2

pandas's row indexes introduces a level of semantic incompatibility with other systems that occasionally causes problems for users who are using both pandas and some other system. Functionally, this mainly...

wesm

API

Aggregation identity on entirely missing data

1

potentially related to #9 numpy ufuncs have an identity, which `pandas` follows with respect to misssing data. ``` np.sum([], dtype='float64') Out[33]: 0.0 np.nansum([np.nan], dtype='float64') Out[35]: 0.0 pd.Series([np.nan]).sum() Out[36]: 0.0 ```...

chris-b1

missing data

dtype precision / conversions

1

this may not actually be an issue as we aren't using float `np.nan` as our missing marker, but we tend to have some subtle issues when int64 are downcast to...

jreback

dtypes

Dtype strict mode

4

pandas `dtype` coercion is useful in most cases, but there are situations to prohibit it to avoid unexpected values being included. The behavior should be switchable like: ``` s =...

sinhrks

dtypes

.loc without reindexing

Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making...

shoyer

indexing

Simplifying indexing (DataFrame.getitem)

2

The rules for exactly what `DataFrame.__getitem__`/`__setitem__` does (pydata/pandas#9595) are sufficiently complex and inconsistent that they are impossible to understand without extensive experimentation. This makes for a rather embarrassing situation that...

shoyer

indexing

First class array/list type

2

Similar to the `ARRAY` type found in SQL variants with nested types. See also the `List` type in Apache Arrow. xref pydata/pandas#8517

wesm

dtypes

DataFrame.lookup() style indexing

`.lookup` https://github.com/pydata/pandas/issues/7138, for coordinate access is useful, but is not incorporated in a generalized indexer.

jreback

indexing

Aligning Series.index with DataFrame.index in broadcasting operations

1

Copying my comment from https://github.com/pydata/pandas/issues/10000#issuecomment-236238297: We should consider making arithmetic between a Series and a DataFrame broadcast across the columns of the dataframe, i.e., aligning `series.index` with `df.index`, rather than...

shoyer

indexing

pandas2
pandas2 copied to clipboard

Metadata

"Predicate pushdown" in group-bys

Alternate groupby API that is more functionally consistent with databases or systems like dplyr

Aggregation identity on entirely missing data

dtype precision / conversions

Dtype strict mode

.loc without reindexing

Simplifying indexing (DataFrame.getitem)

First class array/list type

DataFrame.lookup() style indexing

Aligning Series.index with DataFrame.index in broadcasting operations

← Metadata

Owner

Metadata

pandas2 pandas2 copied to clipboard

Metadata

← Metadata

Owner

Metadata

pandas2
pandas2 copied to clipboard