pint
pint copied to clipboard
UnitStrippedWarning with pandas Series
Hi!
First things first: big fan of your library! Second: Apologies if that behaviour is expected and/or there is something about it in the docs that I didn't see.
I ran into some unexpected warning when adding a pint.Quantity
to a pandas.Series
. See this minimal example:
>>> import pint
>>> import pandas as pd
>>> pd.Series([pint.Quantity('8 nm')])
/versions/3.7.5/lib/python3.7/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning:
The unit of the quantity is stripped when downcasting to ndarray.
0 8 nanometer
dtype: object
I have yet to confirm but I'm 99% certain that this started after I updated numpy
from 1.19.x
to 1.20.1
. Otherwise running pint==0.16.1
and pandas==1.2.3
.
I'm happy to just ignore the warning but I was wondering if there is a "correct" way of doing this.
Cheers, Philipp
You aren't telling pandas to store the data using a PintArray so it converts it to a ndarray of objects. You need to use the dtype argument:
import pint
import pandas as pd
import pint_pandas
pd.Series([pint.Quantity('8 nm')], dtype = "pint[nm]" )
0 8
dtype: pint[nanometer]
That makes sense - thanks for the quick response. Some of my confusion comes from the observation that when I do the same thing but with DataFrame
, there is no warning (despite the datatype ending up object
too):
>>> df = pd.DataFrame([[pint.Quantity('8 nm')]], columns=['pint'])
>>> df
pint
0 8 nanometer
>>> df.dtypes
pint object
dtype: object
I have a similar issue when doing the following:
import pandas as pd
from pint.quantity import Quantity
d = {'variable_value': [60.014000, 60.015000, 60.012886], 'unit': ["hertz", "hertz", "hertz"]}
df = pd.DataFrame(data=d)
df['quantity'] = df.apply(lambda x: Quantity(x['variable_value'], x['unit']), axis=1)
But the result dataframe looks correct:
variable_value unit quantity
0 60.014000 hertz 60.014 hertz
1 60.015000 hertz 60.015 hertz
2 60.012886 hertz 60.012886 hertz
How can I force the result to be a Quantity and void the warning?
df['quantity'] = df.apply(lambda x: Quantity(x['variable_value'], x['unit']), axis=1).astype("pint[Hz]")
it does not look correct; when you can see the unit in the column you can tell it's not read it correct
df.dtypes
variable_value float64
unit object
quantity object
dtype: object
as opposed to
variable_value float64
unit object
quantity pint[hertz]
dtype: object
I suggest reading through the example notebook. https://github.com/hgrecco/pint-pandas/blob/master/notebooks/pint-pandas.ipynb
@andrewgsavage Thanks for the reply, despite the provided code does work for my previous example, my actual case has multiple units on the same dataframe, imagine something like this:
import pandas as pd
from pint.quantity import Quantity
import pint_pandas
d = {'variable_value': [1, 62, 2], 'unit': ["m", "W", "m"]}
df = pd.DataFrame(data=d)
df['quantity'] = df.apply(lambda x: Quantity(x['variable_value'], x['unit']), axis=1)
Ideally I would like to make the column have a type Quantity, instead of actually saying the unit. so each line could have different units.
Is this possible?
Vitor Henrique
no, the unit is stored with the column
TL;DR: I recommend closing this issue (raised by @schlegelp), as it should get resolved by PR #1909.
While dtype='pint[nm]'
would be a preferable way of storing units in pd.Series
or columns of pd.DataFrame
objects, there are cases where it makes more sense to store the unit of each quantity alongside the magnitude. Typically this is the case when units are not compatible (different dimensionalities; see example by @vitormhenrique from above).
Let's look at this example
import numpy as np
import pandas as pd
import pint
values = [i * pint.Quantity('m') for i in range(5)]
ndarray = np.asarray(values, dtype="object")
series = pd.Series(values)
print(values, series, ndarray)
which gives this output
[0 <Unit('meter')>,
1 <Unit('meter')>,
2 <Unit('meter')>,
3 <Unit('meter')>,
4 <Unit('meter')>]
0 0 meter
1 1 meter
2 2 meter
3 3 meter
4 4 meter
dtype: object
array([<Quantity(0, 'meter')>, <Quantity(1, 'meter')>,
<Quantity(2, 'meter')>, <Quantity(3, 'meter')>,
<Quantity(4, 'meter')>], dtype=object)
The current release of pint (v0.23.x) throws a UnitStripedWarning
twice.
UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
result[:] = values
Clearly, this is wrong. The units are not stripped. The code works as expected: it creates a pd.Series
and a np.ndarray
of pint.Quantity
objects.
It's not clear whether this bug in the current version of pint is related to the issue brought up by @schlegelp for pint v0.16.1, but at least they are related.
However, I can confirm that the code changes in PR #1909 will resolve the problem I'm describing above and silence that warnings. The PR has been merged into the master branch in late Dec '23 and will likely be part of pint v0.24.x when it is published.