text-extensions-for-pandas icon indicating copy to clipboard operation
text-extensions-for-pandas copied to clipboard

TensorArray fails when used as boolean mask index

Open BryanCutler opened this issue 4 years ago • 2 comments

Pandas can not recognize an extension array that when taken as numpy, it is a 1-D boolean array and use that array as a boolean mask for indexing.

arr = tp.TensorArray(np.arange(20).reshape(10,2))
s = pd.Series(arr)
thresh = s > 8
s[np.all(thresh.array, axis=1)]

results in: KeyError: "None of [Index([False, False, False, False, False, True, True, True, True, True], dtype='object')] are in the [index]" or other strange errors because it is not picked up as 1-D boolean array and tries to be a list-like indexer or something else

BryanCutler avatar Jan 14 '21 22:01 BryanCutler

In the notebook Text_Extenstions_for_Pandas_Overview an example shows a TensorArray used as a boolean mask:

s[np.all(thresh.array, axis=1)]

This is now failing when the Series tries to validate the mask. Need to find a fix or another way to do this.

This has been resolved in the notebook with a workaround. I wanted to leave this open because Pandas should be able to recognize an extension array that converts to a 1-d bool array and use that as a boolean index.

BryanCutler avatar Apr 05 '21 17:04 BryanCutler

Fixed up the issue to better describe the required functionality from Pandas

BryanCutler avatar Apr 07 '21 22:04 BryanCutler