pandas2
pandas2 copied to clipboard
Design documents and code for the pandas 2.0 effort.
We have `merge()` and `merge_asof()`. There may even come a time when we perform functions on overlapping columns. As someone who wants to join two tables together, I just want...
see https://github.com/pydata/pandas/issues/3146 I have closed that issue, but we should do this when implementing the pandas 2.0 memory allocator
https://github.com/pydata/pandas/issues/3186
per https://github.com/pydata/pandas/issues/4491 we may consider a fixed-size memory pool (which could be managed with an LRU stack) for hash table data to avoid excess internal index hash tables
If any array/Series statistics have been computed, we should serialize them: https://github.com/pydata/pandas/issues/1324
I don't think GitHub can support the level of code scrutiny that we're going to want as part of the pandas 2.0 development process, particularly for C/C++ code that may...
Consider the case of a DataFrame with a large number of distinct groups: ``` import numpy as np arr = np.random.randn(5000000) df = pd.DataFrame({'group': arr.astype('str').repeat(2)}) df['values'] = np.random.randn(len(df)) df.groupby('group').apply(lambda g:...
(Moved from https://github.com/wesm/pandas2-design/issues/1) Disclaimer: I'm not involved in pandas development so my opinion here is not very informed. Sorry about that. :-/ According to my (little) experience in software development,...