pandas
pandas copied to clipboard
ENH: `bool` upcast to `Int64`
Feature Type
-
[ ] Adding new functionality to pandas
-
[X] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
I wish I could upcast bool
to Int64
Feature Description
I have data which can be either bool
or NaN
.
So a simple int(val)
wouldn't work (due to NaNs producing an error).
I don't want a boolean output array, I want Int64
. Is there a reason not to allow casting of bools
to Int64
?
# Toy example
import pandas as pd
data = [True, False, pd.NA]
out = pd.Series(dtype='Int64', index=np.arange(5))
# Desired:
for val in data:
out[0] = val # TypeError: Invalid value 'True' for dtype Int64
Alternative Solutions
# Current workaround:
for val in data:
out[0] = pd.Series(val, dtype='Int64')[0]
Additional Context
-
Would the integration with Arrow be a good opportunity to implement these?
-
because here it says:
Upcasting is always according to the NumPy rules. If two different dtypes are involved in an operation, then the more general one will be used as the result of the operation.
.. maybe now it's not going to be "always according to the NumPy rules" but also to some Arrow rules?
-
Are there any other not-implemented upcasting rules which should be?
However, this proves that such upcasting already works, so I'm puzzled why my example above doesn't:
s = pd.Series(data)
s.astype('Int64')
is the tag BUG:
more appropriate here?
Thanks for the report. Perhaps a better alternative would be out[0] = val if pd.isna(val) else int(val)
. This avoids (somewhat expensive) Series construction.
cc @MarcoGorelli