pandas icon indicating copy to clipboard operation
pandas copied to clipboard

ENH: Pandas Tensor Data Type

Open bionicles opened this issue 1 year ago • 2 comments

Feature Type

  • [X] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

Reviewing Arrow docs link from @WillAyd, spotted this

https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor

Tensor is exactly what I'm talking about in Additional Context [1] and would enable Pandas users to have a column datatype for big blocks of some underlying type

Feature Description

Support Arrow Tensor in Pandas

Python https://arrow.apache.org/docs/python/generated/pyarrow.Tensor.html#pyarrow.Tensor

Rust https://github.com/apache/arrow-rs/blob/3715d5447e468a5a4dc631ae9aafec706c57aa20/arrow/src/tensor.rs#L115

Alternative Solutions

just make everything an "object":

>>> import numpy as np
>>> import pandas as pd
>>> x = {'hello': 'world'}
>>> y = np.ones(3)
>>> df = pd.DataFrame({'X': [x], 'Y': [y]})
>>> df
                    X                Y
0  {'hello': 'world'}  [1.0, 1.0, 1.0]
>>> df.dtypes
X    object
Y    object
dtype: object

Additional Context

[1] https://github.com/pandas-dev/pandas/pull/58455#issuecomment-2161603939 onward

bionicles avatar Jun 13 '24 19:06 bionicles

cc @jbrockmendel if pyarrow plans to support it's compute functions for pyarrow.Tensors, this may be the appropriate 2D EA block backing for ArrowExtensionArray instead of pyarrow.Table

mroeschke avatar Jun 13 '24 23:06 mroeschke

I think the nullability bitmap for the extension array only applies to the entire datum itself, not to individual records within each struct

WillAyd avatar Jun 21 '24 12:06 WillAyd