uncertainties icon indicating copy to clipboard operation
uncertainties copied to clipboard

Pandas ufloat error after upgrade.

Open oscarbranson opened this issue 4 years ago • 5 comments

I recently upgraded pandas to 1.1.0, and found that it is no longer possible to assign a ufloat to a cell in a pandas.DataFrame.

Minimal Working Example

import numpy as np
import pandas as pd
import uncertainties as un
import uncertainties.unumpy as unp

df = pd.DataFrame(np.zeros((2,2)), columns=['A', 'B'])

df.loc[0, 'A'] = unp.uarray(0, 2)  # this works fine

df.loc[0, 'A'] = un.ufloat(0, 2)  # this fails


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-62a59a1e82d7> in <module>
----> 1 df.loc[0, 'A'] = un.ufloat(0, 2)

~/.python/py3/lib/python3.8/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    668 
    669         iloc = self if self.name == "iloc" else self.obj.iloc
--> 670         iloc._setitem_with_indexer(indexer, value)
    671 
    672     def _validate_key(self, key, axis: int):

~/.python/py3/lib/python3.8/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
   1548             if 1 < blk.ndim:  # in case of dict, keys are indices
   1549                 val = list(value.values()) if isinstance(value, dict) else value
-> 1550                 take_split_path = not blk._can_hold_element(val)
   1551 
   1552         # if we have any multi-indexes that have non-trivial slices

~/.python/py3/lib/python3.8/site-packages/pandas/core/internals/blocks.py in _can_hold_element(self, element)
   1922         tipo = maybe_infer_dtype_type(element)
   1923         if tipo is not None:
-> 1924             return issubclass(tipo.type, (np.floating, np.integer)) and not issubclass(
   1925                 tipo.type, (np.datetime64, np.timedelta64)
   1926             )

TypeError: issubclass() arg 1 must be a class

It appears to be the result of some new type checking that happens when values are assigned to the dataframe. The new value must return True from

issubclass(tipo.type, (np.floating, np.integer)) and not issubclass(tipo.type, (np.datetime64, np.timedelta64))

However, issubclass throws an error when given a ufloat (or a uarray). I guess this must mean that the uarray passes an earlier test, whereas the ufloat continues to here.

I appreciate that this isn't a specific problem with the uncertainties codebase, but it feels like a change is more likely here than in the behemoth that is pandas.

Temporary Workaround

Ensure that all uncertainty values are passed to pandas as uarray.

oscarbranson avatar Jul 31 '20 11:07 oscarbranson

Interesting.

Does the unumpy assignment really work as intended, i.e. is the final dataframe value correct? I would have guessed that this too should have failed, because I expect column A to be of type float, not object. This would mean that the proper behavior would be for Pandas to fail even in the unumpy case!

A corollary is that I am expecting that column A should have type object so as to accommodate values with uncertainties.

lebigot avatar Aug 01 '20 04:08 lebigot

Now that I have access to a computer again, I checked the above. Indeed, the crux of the problem is that you can only save numbers with uncertainties in object columns (not float, as in the original example).

Thus, when you do df.loc[0, 'A'] = unp.uarray(0, 2), Pandas converts column A to type object and things work as you expect.

However, when you do df.loc[0, 'A'] = un.ufloat(0, 2), Pandas cannot put an object into a float column, which is normal.

If things behaved differently before, it's because Pandas didn't complain, but it could have (I'm thinking that it would make sense that it complain even in the array case above, because the old and new column types don't match).

If you want a column to contain numbers with uncertainties, you must declare it with the proper Pandas type:

df = pd.DataFrame(np.zeros((2,2)), columns=['A', 'B']).astype({"A": object})  # Note the astype()

With this, df.loc[0, 'A'] = un.ufloat(0, 2) does work as expected.

How does this sound to you?

lebigot avatar Aug 05 '20 09:08 lebigot

Sounds good! The only thing that could be improved is the warning message that pandas gives - it's pretty opaque! Though appreciate this is not your department.

Might be useful to note this in the uncertainties docs somewhere?

oscarbranson avatar Aug 05 '20 11:08 oscarbranson

Yes, I was thinking the same thing. Let me reopen this issue, which becomes: "Docs: discuss the need to have object columns in Pandas data frames (or NumPy arrays) if you need to store numbers with uncertainties there."

lebigot avatar Aug 05 '20 12:08 lebigot

Just a note that that dtype='object' is toxic to Pint-Pandas, so a better answer would be to implement an ExtensionDtype for uncertainties, as first proposed here: https://github.com/lebigot/uncertainties/issues/150

MichaelTiemannOSC avatar Oct 11 '22 15:10 MichaelTiemannOSC