pint-pandas icon indicating copy to clipboard operation
pint-pandas copied to clipboard

`float("nan")` not always converted to `pd.NA` inside series with pint dtype

Open scanzy opened this issue 8 months ago • 2 comments

Hello, I am facing this issue while building a pd.Series with pint dtype.

  1. When float("nan") is alone, it remains float("nan").
  2. When float("nan") is with other values, it is converted into pd.NA.

This is not evident printing the series (the formatting shows always nan), but values or tolist() reveal the difference.

import pint as pt
import pandas as pd
import pint_pandas

# case 1: float nan alone
print(pd.Series([float("nan")], dtype="pint[MW]").tolist())
# gives: [<Quantity(nan, 'megawatt')>]

# case 2: float nan with other values
print(pd.Series([float("nan"), 0.0], dtype="pint[MW]").tolist())
# gives: [<Quantity(<NA>, 'megawatt')>, <Quantity(0.0, 'megawatt')>]

I supposed that float("nan") was the default value meaning "not set magnitude". The fact that nan is converted to pd.NA based on other values in the series looks bit tricky to me: is it intended?

I am looking a way to keep not-set values consistent (either all float("nan"), or all pd.NA), but:

  1. Tying to convert pd.NA to float("nan") has no effect.
  2. If I try to convert float("nan") to pd.NA I get ValueError.
# test 1: trying to convert pd.NA to nan
s = pd.Series([float("nan"), 0.0], dtype="pint[MW]")
print(s.tolist())
# gives: [<Quantity(<NA>, 'megawatt')>, <Quantity(0, 'megawatt')>]

print(s.fillna(float("nan")).tolist())
# gives the same: [<Quantity(<NA>, 'megawatt')>, <Quantity(0, 'megawatt')>]


# test 2: trying to convert nan to pd.NA
s = pd.Series([float("nan")], dtype="pint[MW]")
print(s.tolist())
# gives: [<Quantity(nan, 'megawatt')>]

s.fillna(pd.NA)
# gives: ValueError: float() argument must be a string or a real number, not 'NAType'
versions:
- Python 3.11.2
- pandas 2.2.2
- Pint 0.24.1
- Pint-Pandas 0.6

scanzy avatar Jun 26 '24 20:06 scanzy