pint icon indicating copy to clipboard operation
pint copied to clipboard

Mixed units in column cause wrong results for basic operations on dataframe columns

Open brwe opened this issue 4 years ago • 2 comments

The following code produces wrong results:

import pandas as pd
import pint
import pint_pandas  # type: ignore


merged = pd.DataFrame(
    {
        "some_label": ["a", "b"],
        "some_values": [
            1 * pint.get_application_registry().t,
            1 * pint.get_application_registry().kg,
        ],
        "factors": [
            1,
            1 
        ],
    }
)
merged = merged.astype({"factors": "pint[kg/t]"})

merged["result"] = merged["factors"] * merged["some_values"]

print(merged["factors"])
print(merged["some_values"])

print([q.to("kg") for q in merged.result.values])

Output:

0    1.0
1    1.0
Name: factors, dtype: pint[kilogram / metric_ton]
0    1 metric_ton
1      1 kilogram
Name: some_values, dtype: object
[<Quantity(1.0, 'kilogram')>, <Quantity(1.0, 'kilogram')>]

The second value of the result column is wrong. I would have expected this output:

0    1.0 kilogram / metric_ton
1    1.0 kilogram / metric_ton
Name: factors, dtype: object
0    1 metric_ton
1      1 kilogram
Name: some_values, dtype: object
[<Quantity(1.0, 'kilogram')>, <Quantity(0.001, 'kilogram')>]

which I get when I change the dataframe to be created with the same units but per row for column factors:

import pandas as pd
import pint
import pint_pandas  # type: ignore


merged = pd.DataFrame(
    {
        "some_label": ["a", "b"],
        "some_values": [
            1 * pint.get_application_registry().t,
            1 * pint.get_application_registry().kg,
        ],
        "factors": [
            1 * pint.get_application_registry().kg / pint.get_application_registry().t,
            1 * pint.get_application_registry().kg / pint.get_application_registry().t,
        ],
    }
)

merged["result"] = merged["factors"] * merged["some_values"]

print(merged["factors"])
print(merged["some_values"])

print([q.to("kg") for q in merged.result.values])

This means that with mixed unit rows in a dataframe the results of operations might be wrong. Am I using this wrong or is this a bug?

brwe avatar Dec 14 '21 11:12 brwe

When you've created columns for some_values and factors, you've provided a list of quantities, which pandas treats as objects - and you see this when looking at the dtypes. When you've done merged = merged.astype({"factors": "pint[kg/t]"}) this converts the factors column to a PintArray. Do the same for some_values and it will work.

I suggest you look at the example notebook which shows several different ways to create columns in dataframes.

andrewgsavage avatar Jan 10 '22 23:01 andrewgsavage

Thanks but this was not a question about how to do this. What I am saying is that I get a wrong result without any warning just by creating dataframe columns this and not another way. This is dangerous af. Is there a way to throw instead of just outputting a wrong result?

brwe avatar Jan 15 '22 09:01 brwe