Wes McKinney
Wes McKinney
Similar to the `ARRAY` type found in SQL variants with nested types. See also the `List` type in Apache Arrow. xref pydata/pandas#8517
see https://github.com/pydata/pandas/issues/3146 I have closed that issue, but we should do this when implementing the pandas 2.0 memory allocator
https://github.com/pydata/pandas/issues/3186
per https://github.com/pydata/pandas/issues/4491 we may consider a fixed-size memory pool (which could be managed with an LRU stack) for hash table data to avoid excess internal index hash tables
If any array/Series statistics have been computed, we should serialize them: https://github.com/pydata/pandas/issues/1324
I don't think GitHub can support the level of code scrutiny that we're going to want as part of the pandas 2.0 development process, particularly for C/C++ code that may...
Consider the case of a DataFrame with a large number of distinct groups: ``` import numpy as np arr = np.random.randn(5000000) df = pd.DataFrame({'group': arr.astype('str').repeat(2)}) df['values'] = np.random.randn(len(df)) df.groupby('group').apply(lambda g:...
hi all, great to see some continued work on this project after the original discussion from last year. I still think it's useful to allow libraries to "throw data over...
There was a question on the sync call today about defining "what is a data frame?". People may have different perspectives, but I wanted to offer mine: --- A "data...
Based on https://discuss.ossdata.org/t/a-dataframe-protocol-for-the-pydata-ecosystem/267, we are discussing a "protocol" method (potentially called `__dataframe__`) similar to `__array__` for data frame-like data. The consensus so far is that this protocol should not force...