pint-pandas icon indicating copy to clipboard operation
pint-pandas copied to clipboard

.apply not working "properly"

Open a7p opened this issue 4 years ago • 7 comments

In our project we make a lot of usage of DataFrame.apply and Series.apply - using these with pandas does not set the dtype of the result properly. A simple workaround is to call apply as follows:

df = pd.DataFrame([2, 3], dtype="pint[W]")
df["new"] = df.apply(lambda x: x[0]).astype("pint[W])"

This requires me to set the unit manually which I'd like to avoid. As a workaround we patch the panda's apply-functions with these functions:

def df_apply(manual_self, *args, **kwargs):
    """
    A pint friendly version of pandas DataFrame.apply.

    Normally `pd.DataFrame.apply` would not set the dtype for the result properly.
    """

    res = PintApply.original_df_apply(manual_self, *args, **kwargs)
    if isinstance(res, pd.DataFrame):
        cols_with_units = [hasattr(res[col][0], "units") for col in res]
        if all(cols_with_units):
            types = {col: f"pint[{res[col][0].units}]" for col in res}
            magnitudes = res.applymap(lambda x: x.magnitude)
            res = magnitudes.astype(types)
            return res
        elif any(cols_with_units):
            raise Exception(
                "This DataFrame contains pint and none pint values - don't mix!"
            )
    elif isinstance(res, pd.Series):
        if hasattr(res[0], "units"):
            unit = res[0].units
            magnitude = res.transform(lambda x: x.magnitude)
            if str(unit) == "":
                return magnitude.astype("pint[dimensionless]")
            return magnitude.astype(f"pint[{unit}]")
    return res

@staticmethod
def series_apply(manual_self, *args, **kwarg):
    """
    A pint friendly version of pandas Series.apply.

    Normally `pd.Series.apply` would not set the dtype for the result properly.
    """

    res = PintApply.original_series_apply(manual_self, *args, **kwarg)
    if hasattr(res[0], "units"):
        unit = res[0].units
        magnitude = res.transform(lambda x: x.magnitude)
        if str(unit) == "":
            return magnitude.astype("pint[dimensionless]")
        return magnitude.astype(f"pint[{unit}]")
    return res

If this is a solution you'd like to have in pint_pandas I'll prepare a PR, if it's a none issue I'll happy to learn about a better solution.

a7p avatar Feb 09 '21 15:02 a7p

could you add the code to create the df before

df["new"] = df.apply(lambda x: x[0]).astype("pint[W])"

I'm not 100% sure what's in your df before you do that.

andrewgsavage avatar Feb 09 '21 18:02 andrewgsavage

I've updated the original issue with the declaration of df df = pd.DataFrame([2, 3], dtype="pint[W]")

a7p avatar Feb 10 '21 09:02 a7p

Thanks that helps.

For your series example, there is a hidden method that would avoid the need to specify the unit:

import pint_pandas 
PA_= pint_pandas.PintArray
s =df.apply(lambda x: x[0])
df["new"] = PA_._from_sequence(s)

This wouldn't work for your df example though.

Your solution would mean df.pint.apply(lambda x: x[0])

Another option would be to have a accessor that converts a series or df containing sequences of pint quantities to pint dtypes, so your example would become df.apply(lambda x: x[0]).pint.fix_pint_dtypes() This would work for other functions like df.max().pint.fix_pint_dtypes() What are your thoughts on this?

I think you should ask this to pandas-dev too, they may have a better way to do this which pint-pandas can implement. Or they might not and it highlights missing functionality.

andrewgsavage avatar Feb 10 '21 11:02 andrewgsavage

Thank you! I'll look into this by the end of the week (hopefully).

a7p avatar Feb 15 '21 11:02 a7p

sorry, it lost urgency in my work-related topics, so I cannot spend time on this.

a7p avatar Jun 29 '21 08:06 a7p

PA_._from_sequence(s)

OMG, I spent most of the afternoon trying to find such a thing. I'm so glad it exist. And so mystified as to why it is not default behavior. For my case, we have dataframes of uniform quantities that need to be sliced (by company) and diced (by year). The PintArray proerty is nicely preserved in the row direction, but not at all in the column direction. But all the qty info is preserved in the pd.Series elements. Hence the utility of this function.

MichaelTiemannOSC avatar Dec 30 '21 02:12 MichaelTiemannOSC

apply works for series now, but still has this issue for dataframe.apply which doesn't reach PA.map()

andrewgsavage avatar Aug 15 '23 09:08 andrewgsavage