pandas2 issues

Unified merge API

We have `merge()` and `merge_asof()`. There may even come a time when we perform functions on overlapping columns. As someone who wants to join two tables together, I just want...

chrisaycock

API

Use all aligned, 128 to 512-bit memory allocations

see https://github.com/pydata/pandas/issues/3146 I have closed that issue, but we should do this when implementing the pandas 2.0 memory allocator

wesm

performance

Much faster to_csv implementation (in libpandas)

https://github.com/pydata/pandas/issues/3186

wesm

performance

IO

More careful management of hash table allocations

1

per https://github.com/pydata/pandas/issues/4491 we may consider a fixed-size memory pool (which could be managed with an LRU stack) for hash table data to avoid excess internal index hash tables

wesm

memory-use

Serializing more array metadata

If any array/Series statistics have been computed, we should serialize them: https://github.com/pydata/pandas/issues/1324

wesm

dtypes

Code review tools for libpandas

1

I don't think GitHub can support the level of code scrutiny that we're going to want as part of the pandas 2.0 development process, particularly for C/C++ code that may...

wesm

Improving groupby-apply microperformance

Consider the case of a DataFrame with a large number of distinct groups: ``` import numpy as np arr = np.random.randn(5000000) df = pd.DataFrame({'group': arr.astype('str').repeat(2)}) df['values'] = np.random.randn(len(df)) df.groupby('group').apply(lambda g:...

wesm

performance

Iterative changes instead of all of them in 2.0 version?

3

(Moved from https://github.com/wesm/pandas2-design/issues/1) Disclaimer: I'm not involved in pandas development so my opinion here is not very informed. Sorry about that. :-/ According to my (little) experience in software development,...

dukebody

pandas2
pandas2 copied to clipboard

Metadata

Unified merge API

Use all aligned, 128 to 512-bit memory allocations

Much faster to_csv implementation (in libpandas)

More careful management of hash table allocations

Serializing more array metadata

Code review tools for libpandas

Improving groupby-apply microperformance

Iterative changes instead of all of them in 2.0 version?

← Metadata

Owner

Metadata

pandas2 pandas2 copied to clipboard

Metadata

← Metadata

Owner

Metadata

pandas2
pandas2 copied to clipboard