traitlets
traitlets copied to clipboard
Notifying a trait with a DataFrame instance throws Value Error
Hey there,
At first, thank you for this amazing library!
I noticed that there are problems when linking multiple objects using the link function, if the value of a trait is a dataframe. To me it looks like the compare logic overridden by pandas is causing the problem. Here is an example:
import traitlets
from traitlets import link, directional_link
import pandas as pd
class SomeClass(traitlets.HasTraits):
df = traitlets.Instance(klass=pd.DataFrame, allow_none=True)
foo = SomeClass()
baz = SomeClass()
bar = SomeClass()
# Will not work
link((foo, "df"), (baz, "df"))
foo.df = pd.DataFrame() # Throws ValueError
Stacktrace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
traitlets_dataframe.ipynb Cell 4' in <cell line: 5>()
2 link((foo, "df"), (baz, "df"))
4 # Throws ValueError
----> 5 foo.df = pd.DataFrame()
File .\lib\site-packages\traitlets\traitlets.py:712, in TraitType.__set__(self, obj, value)
710 raise TraitError('The "%s" trait is read-only.' % self.name)
711 else:
--> 712 self.set(obj, value)
File .\lib\site-packages\traitlets\traitlets.py:701, in TraitType.set(self, obj, value)
697 silent = False
698 if silent is not True:
699 # we explicitly compare silent to True just in case the equality
700 # comparison above returns something other than True/False
--> 701 obj._notify_trait(self.name, old_value, new_value)
File .\lib\site-packages\traitlets\traitlets.py:1371, in HasTraits._notify_trait(self, name, old_value, new_value)
1370 def _notify_trait(self, name, old_value, new_value):
-> 1371 self.notify_change(
1372 Bunch(
1373 name=name,
1374 old=old_value,
1375 new=new_value,
1376 owner=self,
1377 type="change",
1378 )
1379 )
File .\lib\site-packages\traitlets\traitlets.py:1383, in HasTraits.notify_change(self, change)
1381 def notify_change(self, change):
1382 """Notify observers of a change event"""
-> 1383 return self._notify_observers(change)
File .\lib\site-packages\traitlets\traitlets.py:1428, in HasTraits._notify_observers(self, event)
1425 elif isinstance(c, EventHandler) and c.name is not None:
1426 c = getattr(self, c.name)
-> 1428 c(event)
File .\lib\site-packages\traitlets\traitlets.py:366, in link._update_target(self, change)
364 with self._busy_updating():
365 setattr(self.target[0], self.target[1], self._transform(change.new))
--> 366 if getattr(self.source[0], self.source[1]) != change.new:
367 raise TraitError(
368 "Broken link {}: the source value changed while updating "
369 "the target.".format(self)
370 )
File .\lib\site-packages\pandas\core\generic.py:1527, in NDFrame.__nonzero__(self)
1525 @final
1526 def __nonzero__(self):
-> 1527 raise ValueError(
1528 f"The truth value of a {type(self).__name__} is ambiguous. "
1529 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1530 )
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
As you can see, hhe problem is in traitlets.py: 366
, because getattr(self.source[0], self.source[1]) != change.new
does not return a bool value in the case of a DataFrame.
Would it be possible to make this function compatible with pandas, or possibly define a custom function for comparison?
Thank you in advance!
I've run into the same issue when trying to observe a Instance(klass=pd.DataFrame)
.
I am not sure this is really a traitlets problem... but more the "bad behaviour" of the DataFrame __ne__
function. My interpretation is that traitlets is asking if the objects are equal ... which gets a little bit messy if we are talking about non primitive types. For the case of an Instance and perhaps some other examples, perhaps the check should be more like is not
?
I've worked around it for my case by overriding the __ne__
... that is something like:
class MyDataFrame(pandas.DataFrame):
def __ne__(self, other):
return self is not other
(although this is causing other headaches... )
I'd recommend overriding this value-based comparison behavior of link
in _update_source
and _update_target
with an identity-based comparison (i.e. use the is
operator). This way you don't have to subclass DataFrame
.
If you'd like to make a contribution, it would be better if link
had a _should_update
method that _update_source
and _update_target
could call to check if a value has changed. This way you could override this one method in order to implement your desired comparison behavior.
EDIT (WARNING): It does not work when multiple observers on the trait, since the raised error will still interrupt the loop over callbacks ...
I found a workaround by creating a dataframe trait type that ignores that particular error:
class TraitletsPandasDataFrame(traitlets.Instance):
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, klass=pd.DataFrame, **kwargs)
def __set__(self, obj: traitlets.HasTraits, value) -> None:
# Ignore error raised by old and new dataframes comparison
# see https://github.com/ipython/traitlets/issues/756
try:
super().__set__(obj, value)
except ValueError as e:
if not (
len(e.args) > 0
and isinstance(e.args[0], str)
and e.args[0].startswith("The truth value of a DataFrame is ambiguous.")
):
raise e
class SomeClass(traitlets.HasTraits):
df = TraitletsPandasDataFrame()