Wes McKinney issues

Results 59 issues of


                                            Wes McKinney

First class array/list type

Similar to the `ARRAY` type found in SQL variants with nested types. See also the `List` type in Apache Arrow. xref pydata/pandas#8517

dtypes

Use all aligned, 128 to 512-bit memory allocations

see https://github.com/pydata/pandas/issues/3146 I have closed that issue, but we should do this when implementing the pandas 2.0 memory allocator

performance

Much faster to_csv implementation (in libpandas)

https://github.com/pydata/pandas/issues/3186

performance

More careful management of hash table allocations

per https://github.com/pydata/pandas/issues/4491 we may consider a fixed-size memory pool (which could be managed with an LRU stack) for hash table data to avoid excess internal index hash tables

memory-use

Serializing more array metadata

If any array/Series statistics have been computed, we should serialize them: https://github.com/pydata/pandas/issues/1324

dtypes

Code review tools for libpandas

I don't think GitHub can support the level of code scrutiny that we're going to want as part of the pandas 2.0 development process, particularly for C/C++ code that may...

Improving groupby-apply microperformance

Consider the case of a DataFrame with a large number of distinct groups: ``` import numpy as np arr = np.random.randn(5000000) df = pd.DataFrame({'group': arr.astype('str').repeat(2)}) df['values'] = np.random.randn(len(df)) df.groupby('group').apply(lambda g:...

performance

Some comments on interchange API from an Arrow developer

hi all, great to see some continued work on this project after the original discussion from last year. I still think it's useful to allow libraries to "throw data over...

interchange-protocol

Trying to define "data frame"

There was a question on the sync call today about defining "what is a data frame?". People may have different perspectives, but I wanted to offer mine: --- A "data...

Draft strawman data frame "dataframe" interchange / data export protocol for discussion

Based on https://discuss.ossdata.org/t/a-dataframe-protocol-for-the-pydata-ecosystem/267, we are discussing a "protocol" method (potentially called `__dataframe__`) similar to `__array__` for data frame-like data. The consensus so far is that this protocol should not force...