pandas icon indicating copy to clipboard operation
pandas copied to clipboard

ENH: `bool` upcast to `Int64`

Open VladimirFokow opened this issue 1 year ago • 1 comments

Feature Type

  • [ ] Adding new functionality to pandas

  • [X] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I wish I could upcast bool to Int64

Feature Description

I have data which can be either bool or NaN.

So a simple int(val) wouldn't work (due to NaNs producing an error).

I don't want a boolean output array, I want Int64. Is there a reason not to allow casting of bools to Int64?

# Toy example
import pandas as pd

data = [True, False, pd.NA]
out = pd.Series(dtype='Int64', index=np.arange(5))

# Desired:
for val in data:
    out[0] = val  # TypeError: Invalid value 'True' for dtype Int64

Alternative Solutions

# Current workaround:
for val in data:
    out[0] = pd.Series(val, dtype='Int64')[0]

Additional Context

  • Would the integration with Arrow be a good opportunity to implement these?

  • because here it says:

    Upcasting is always according to the NumPy rules. If two different dtypes are involved in an operation, then the more general one will be used as the result of the operation.

    .. maybe now it's not going to be "always according to the NumPy rules" but also to some Arrow rules?

  • Are there any other not-implemented upcasting rules which should be?

However, this proves that such upcasting already works, so I'm puzzled why my example above doesn't:

s = pd.Series(data)
s.astype('Int64')

is the tag BUG: more appropriate here?

VladimirFokow avatar Feb 07 '24 20:02 VladimirFokow

Thanks for the report. Perhaps a better alternative would be out[0] = val if pd.isna(val) else int(val). This avoids (somewhat expensive) Series construction.

cc @MarcoGorelli

rhshadrach avatar Feb 15 '24 04:02 rhshadrach