pint icon indicating copy to clipboard operation
pint copied to clipboard

UnitStrippedWarning with pandas Series

Open schlegelp opened this issue 3 years ago • 7 comments

Hi!

First things first: big fan of your library! Second: Apologies if that behaviour is expected and/or there is something about it in the docs that I didn't see.

I ran into some unexpected warning when adding a pint.Quantity to a pandas.Series. See this minimal example:

>>> import pint
>>> import pandas as pd
>>> pd.Series([pint.Quantity('8 nm')])                                                                                                                                      
/versions/3.7.5/lib/python3.7/site-packages/pandas/core/dtypes/cast.py:1638: UnitStrippedWarning:

The unit of the quantity is stripped when downcasting to ndarray.

0    8 nanometer
dtype: object

I have yet to confirm but I'm 99% certain that this started after I updated numpy from 1.19.x to 1.20.1. Otherwise running pint==0.16.1 and pandas==1.2.3.

I'm happy to just ignore the warning but I was wondering if there is a "correct" way of doing this.

Cheers, Philipp

schlegelp avatar Mar 15 '21 12:03 schlegelp

You aren't telling pandas to store the data using a PintArray so it converts it to a ndarray of objects. You need to use the dtype argument:

import pint
import pandas as pd
import pint_pandas
pd.Series([pint.Quantity('8 nm')], dtype = "pint[nm]" )       

0    8
dtype: pint[nanometer]

andrewgsavage avatar Mar 15 '21 21:03 andrewgsavage

That makes sense - thanks for the quick response. Some of my confusion comes from the observation that when I do the same thing but with DataFrame, there is no warning (despite the datatype ending up object too):

>>> df = pd.DataFrame([[pint.Quantity('8 nm')]], columns=['pint'])
>>> df
          pint
0  8 nanometer
>>> df.dtypes
pint    object
dtype: object

schlegelp avatar Mar 16 '21 09:03 schlegelp

I have a similar issue when doing the following:

import pandas as pd
from pint.quantity import Quantity
d = {'variable_value': [60.014000, 60.015000, 60.012886], 'unit': ["hertz", "hertz", "hertz"]}
df = pd.DataFrame(data=d)
df['quantity'] = df.apply(lambda x: Quantity(x['variable_value'], x['unit']), axis=1)

But the result dataframe looks correct:

   variable_value   unit         quantity
0       60.014000  hertz     60.014 hertz
1       60.015000  hertz     60.015 hertz
2       60.012886  hertz  60.012886 hertz

How can I force the result to be a Quantity and void the warning?

vitormhenrique avatar Jan 18 '22 16:01 vitormhenrique

df['quantity'] = df.apply(lambda x: Quantity(x['variable_value'], x['unit']), axis=1).astype("pint[Hz]")

it does not look correct; when you can see the unit in the column you can tell it's not read it correct

df.dtypes

variable_value    float64
unit               object
quantity           object
dtype: object

as opposed to

variable_value        float64
unit                   object
quantity          pint[hertz]
dtype: object

I suggest reading through the example notebook. https://github.com/hgrecco/pint-pandas/blob/master/notebooks/pint-pandas.ipynb

andrewgsavage avatar Jan 19 '22 01:01 andrewgsavage

@andrewgsavage Thanks for the reply, despite the provided code does work for my previous example, my actual case has multiple units on the same dataframe, imagine something like this:

import pandas as pd
from pint.quantity import Quantity
import pint_pandas
    
d = {'variable_value': [1, 62, 2], 'unit': ["m", "W", "m"]}
df = pd.DataFrame(data=d)
df['quantity'] = df.apply(lambda x: Quantity(x['variable_value'], x['unit']), axis=1)

Ideally I would like to make the column have a type Quantity, instead of actually saying the unit. so each line could have different units.

Is this possible?

Vitor Henrique

vitormhenrique avatar Jan 19 '22 03:01 vitormhenrique

no, the unit is stored with the column

andrewgsavage avatar Jan 19 '22 11:01 andrewgsavage

TL;DR: I recommend closing this issue (raised by @schlegelp), as it should get resolved by PR #1909.

While dtype='pint[nm]' would be a preferable way of storing units in pd.Series or columns of pd.DataFrame objects, there are cases where it makes more sense to store the unit of each quantity alongside the magnitude. Typically this is the case when units are not compatible (different dimensionalities; see example by @vitormhenrique from above).

Let's look at this example

import numpy as np
import pandas as pd
import pint

values = [i * pint.Quantity('m') for i in range(5)]
ndarray = np.asarray(values, dtype="object")
series = pd.Series(values)

print(values, series, ndarray)

which gives this output

[0 <Unit('meter')>,
 1 <Unit('meter')>,
 2 <Unit('meter')>,
 3 <Unit('meter')>,
 4 <Unit('meter')>]
0    0 meter
1    1 meter
2    2 meter
3    3 meter
4    4 meter
dtype: object
array([<Quantity(0, 'meter')>, <Quantity(1, 'meter')>,
       <Quantity(2, 'meter')>, <Quantity(3, 'meter')>,
       <Quantity(4, 'meter')>], dtype=object)

The current release of pint (v0.23.x) throws a UnitStripedWarning twice.

UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  result[:] = values

Clearly, this is wrong. The units are not stripped. The code works as expected: it creates a pd.Series and a np.ndarray of pint.Quantity objects.

It's not clear whether this bug in the current version of pint is related to the issue brought up by @schlegelp for pint v0.16.1, but at least they are related.

However, I can confirm that the code changes in PR #1909 will resolve the problem I'm describing above and silence that warnings. The PR has been merged into the master branch in late Dec '23 and will likely be part of pint v0.24.x when it is published.

PhilippVerpoort avatar Feb 01 '24 16:02 PhilippVerpoort