uncertainties
uncertainties copied to clipboard
Pandas ufloat error after upgrade.
I recently upgraded pandas to 1.1.0, and found that it is no longer possible to assign a ufloat
to a cell in a pandas.DataFrame
.
Minimal Working Example
import numpy as np
import pandas as pd
import uncertainties as un
import uncertainties.unumpy as unp
df = pd.DataFrame(np.zeros((2,2)), columns=['A', 'B'])
df.loc[0, 'A'] = unp.uarray(0, 2) # this works fine
df.loc[0, 'A'] = un.ufloat(0, 2) # this fails
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-62a59a1e82d7> in <module>
----> 1 df.loc[0, 'A'] = un.ufloat(0, 2)
~/.python/py3/lib/python3.8/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
668
669 iloc = self if self.name == "iloc" else self.obj.iloc
--> 670 iloc._setitem_with_indexer(indexer, value)
671
672 def _validate_key(self, key, axis: int):
~/.python/py3/lib/python3.8/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
1548 if 1 < blk.ndim: # in case of dict, keys are indices
1549 val = list(value.values()) if isinstance(value, dict) else value
-> 1550 take_split_path = not blk._can_hold_element(val)
1551
1552 # if we have any multi-indexes that have non-trivial slices
~/.python/py3/lib/python3.8/site-packages/pandas/core/internals/blocks.py in _can_hold_element(self, element)
1922 tipo = maybe_infer_dtype_type(element)
1923 if tipo is not None:
-> 1924 return issubclass(tipo.type, (np.floating, np.integer)) and not issubclass(
1925 tipo.type, (np.datetime64, np.timedelta64)
1926 )
TypeError: issubclass() arg 1 must be a class
It appears to be the result of some new type checking that happens when values are assigned to the dataframe. The new value must return True from
issubclass(tipo.type, (np.floating, np.integer)) and not issubclass(tipo.type, (np.datetime64, np.timedelta64))
However, issubclass throws an error when given a ufloat
(or a uarray
). I guess this must mean that the uarray
passes an earlier test, whereas the ufloat
continues to here.
I appreciate that this isn't a specific problem with the uncertainties
codebase, but it feels like a change is more likely here than in the behemoth that is pandas
.
Temporary Workaround
Ensure that all uncertainty values are passed to pandas as uarray
.
Interesting.
Does the unumpy assignment really work as intended, i.e. is the final dataframe value correct? I would have guessed that this too should have failed, because I expect column A to be of type float, not object. This would mean that the proper behavior would be for Pandas to fail even in the unumpy case!
A corollary is that I am expecting that column A should have type object so as to accommodate values with uncertainties.
Now that I have access to a computer again, I checked the above. Indeed, the crux of the problem is that you can only save numbers with uncertainties in object columns (not float, as in the original example).
Thus, when you do df.loc[0, 'A'] = unp.uarray(0, 2)
, Pandas converts column A to type object and things work as you expect.
However, when you do df.loc[0, 'A'] = un.ufloat(0, 2)
, Pandas cannot put an object into a float column, which is normal.
If things behaved differently before, it's because Pandas didn't complain, but it could have (I'm thinking that it would make sense that it complain even in the array case above, because the old and new column types don't match).
If you want a column to contain numbers with uncertainties, you must declare it with the proper Pandas type:
df = pd.DataFrame(np.zeros((2,2)), columns=['A', 'B']).astype({"A": object}) # Note the astype()
With this, df.loc[0, 'A'] = un.ufloat(0, 2)
does work as expected.
How does this sound to you?
Sounds good! The only thing that could be improved is the warning message that pandas gives - it's pretty opaque! Though appreciate this is not your department.
Might be useful to note this in the uncertainties docs somewhere?
Yes, I was thinking the same thing. Let me reopen this issue, which becomes: "Docs: discuss the need to have object columns in Pandas data frames (or NumPy arrays) if you need to store numbers with uncertainties there."
Just a note that that dtype='object' is toxic to Pint-Pandas, so a better answer would be to implement an ExtensionDtype for uncertainties, as first proposed here: https://github.com/lebigot/uncertainties/issues/150