pint-pandas
pint-pandas copied to clipboard
.apply not working "properly"
In our project we make a lot of usage of DataFrame.apply and Series.apply - using these with pandas does not set the dtype of the result properly. A simple workaround is to call apply as follows:
df = pd.DataFrame([2, 3], dtype="pint[W]")
df["new"] = df.apply(lambda x: x[0]).astype("pint[W])"
This requires me to set the unit manually which I'd like to avoid. As a workaround we patch the panda's apply-functions with these functions:
def df_apply(manual_self, *args, **kwargs):
"""
A pint friendly version of pandas DataFrame.apply.
Normally `pd.DataFrame.apply` would not set the dtype for the result properly.
"""
res = PintApply.original_df_apply(manual_self, *args, **kwargs)
if isinstance(res, pd.DataFrame):
cols_with_units = [hasattr(res[col][0], "units") for col in res]
if all(cols_with_units):
types = {col: f"pint[{res[col][0].units}]" for col in res}
magnitudes = res.applymap(lambda x: x.magnitude)
res = magnitudes.astype(types)
return res
elif any(cols_with_units):
raise Exception(
"This DataFrame contains pint and none pint values - don't mix!"
)
elif isinstance(res, pd.Series):
if hasattr(res[0], "units"):
unit = res[0].units
magnitude = res.transform(lambda x: x.magnitude)
if str(unit) == "":
return magnitude.astype("pint[dimensionless]")
return magnitude.astype(f"pint[{unit}]")
return res
@staticmethod
def series_apply(manual_self, *args, **kwarg):
"""
A pint friendly version of pandas Series.apply.
Normally `pd.Series.apply` would not set the dtype for the result properly.
"""
res = PintApply.original_series_apply(manual_self, *args, **kwarg)
if hasattr(res[0], "units"):
unit = res[0].units
magnitude = res.transform(lambda x: x.magnitude)
if str(unit) == "":
return magnitude.astype("pint[dimensionless]")
return magnitude.astype(f"pint[{unit}]")
return res
If this is a solution you'd like to have in pint_pandas I'll prepare a PR, if it's a none issue I'll happy to learn about a better solution.
could you add the code to create the df before
df["new"] = df.apply(lambda x: x[0]).astype("pint[W])"
I'm not 100% sure what's in your df before you do that.
I've updated the original issue with the declaration of df
df = pd.DataFrame([2, 3], dtype="pint[W]")
Thanks that helps.
For your series example, there is a hidden method that would avoid the need to specify the unit:
import pint_pandas
PA_= pint_pandas.PintArray
s =df.apply(lambda x: x[0])
df["new"] = PA_._from_sequence(s)
This wouldn't work for your df example though.
Your solution would mean df.pint.apply(lambda x: x[0])
Another option would be to have a accessor that converts a series or df containing sequences of pint quantities to pint dtypes, so your example would become
df.apply(lambda x: x[0]).pint.fix_pint_dtypes()
This would work for other functions like df.max().pint.fix_pint_dtypes()
What are your thoughts on this?
I think you should ask this to pandas-dev too, they may have a better way to do this which pint-pandas can implement. Or they might not and it highlights missing functionality.
Thank you! I'll look into this by the end of the week (hopefully).
sorry, it lost urgency in my work-related topics, so I cannot spend time on this.
PA_._from_sequence(s)
OMG, I spent most of the afternoon trying to find such a thing. I'm so glad it exist. And so mystified as to why it is not default behavior. For my case, we have dataframes of uniform quantities that need to be sliced (by company) and diced (by year). The PintArray proerty is nicely preserved in the row direction, but not at all in the column direction. But all the qty info is preserved in the pd.Series elements. Hence the utility of this function.
apply works for series now, but still has this issue for dataframe.apply which doesn't reach PA.map()