text-extensions-for-pandas
text-extensions-for-pandas copied to clipboard
TensorArray fails when used as boolean mask index
Pandas can not recognize an extension array that when taken as numpy, it is a 1-D boolean array and use that array as a boolean mask for indexing.
arr = tp.TensorArray(np.arange(20).reshape(10,2))
s = pd.Series(arr)
thresh = s > 8
s[np.all(thresh.array, axis=1)]
results in:
KeyError: "None of [Index([False, False, False, False, False, True, True, True, True, True], dtype='object')] are in the [index]"
or other strange errors because it is not picked up as 1-D boolean array and tries to be a list-like indexer or something else
In the notebook
Text_Extenstions_for_Pandas_Overview
an example shows a TensorArray used as a boolean mask:s[np.all(thresh.array, axis=1)]
This is now failing when the Series tries to validate the mask. Need to find a fix or another way to do this.
This has been resolved in the notebook with a workaround. I wanted to leave this open because Pandas should be able to recognize an extension array that converts to a 1-d bool array and use that as a boolean index.
Fixed up the issue to better describe the required functionality from Pandas