traits
traits copied to clipboard
TraitSet, TraitList and TraitDict: validate _all_ incoming values?
The current policy in TraitList
, TraitSet
and friends is that values that are going to be added to the list/set/whatever are passed through the item validator, while values being removed are not.
This makes some sense: one worry here is that we might do something like this, where mymodel
is a HasTraits
instance and my_set
is a Set
-based trait on mymodel
:
element = find_matching_element(mymodel.my_set)
do_something_with(element)
mymodel.my_set.remove(element)
Here element
has likely come from direct iteration over the TraitSet
object, and so should be passed directly to the remove
call without any transformation. If we passed element
through the item_validator
and the validation changed it in some way, so that the remove
call then failed, that's clearly undesirable.
However, the logic needed to follow this policy in something like TraitSet.symmetric_difference
gets really ugly, and the result that some of the elements of the r.h.s to TraitSet.symmetric_difference
will be passed through the validator, while others won't, depending on the exact contents of the set being modified, seems like a recipe for confusion.
I think there may be a better/simpler way, namely:
- assume and document that
item_validator
s should be idempotent: i.e., that for an already validated itemitem
,item_validator(item) == item
. (Probably actually identical rather than just equal in practice, but equality is what matters for collections.) This is already true for all item validators that we care about, I think, including for coercing trait types likeCStr
and potentially value-modifying trait types likeFloat
(consider integer inputs toFloat
, for example). - validate all incoming values to set/list/dict operations. (Credit to @midhun-pm, who originally suggested that we do this, while I originally shot the idea down.)
The interesting side-effect of the second point is that given a Set(Int)
trait, an operation like my_set.remove("not an integer")
would raise TraitError
instead of the expected KeyError
. The exception in this situation would likely represent a coding bug anyway (rather than being an expected error that people expected to handle), so I think this is a reasonable tradeoff for the extra simplicity and predictability of the TraitSet
operations.
validate all incoming values to set/list/dict operations.
Would that include __contains__
too?
Would that include
__contains__
too?
Yes, for simplicity: the rule would be that everything gets passed through the normalization / validation.
It's going to make some things work, and others not. But I think that's true no matter what we do. (For example, if x in myobj.myset
is true, we'd expect myobj.myset.remove(x)
not to raise.)
Aargh. But __contains__
shouldn't raise anything at all, as a general rule, which was of course your point. :-(
But I think we're okay: we still validate, but we don't pass on the TraitError
: we catch it and return False
.
Yep. Catch it and return False
would be fine.
Just to clarify... for removing values, the proposal here is only about validating the elements provided, but not to transform them before removal, right? If that's true, I am seeing how the ugly block in TraitSet.symmetric_difference
you quoted can be made simpler...
I am seeing
I am sorry... I keep missing important negation. I meant to say "I am not seeing".
the proposal here is only about validating the elements provided, but not to transform them before removal, right
I don't know how you can separate the two, given that item_validator
is doing both validation and normalization/transformation. (Which is fine.)
But no, the proposal would be to validate and normalize, and remove the normalised elements.
(BTW, I think "normalization" is a better way of thinking about the transformation that the validator is doing than "transformation", since it suggests that idempotence that we do have in practice.)
So in other words, the first line of symmetric_difference_update
would be:
values = {self.item_validator(value) for value in values}
and from that point on we wouldn't do anything more with the original values or the item validator.
I see. Now I understand.
Just putting this on the table for consideration:
Deprecate the use of trait types that do casting in containers such that something like this will raise: Dict(CStr)
.
Then we can compare the cost/feasibility of doing it and the cost/feasibility of prohibiting it...
Deprecate the use of trait types that do casting in containers such that something like this will raise:
Dict(CStr)
.
I'd be okay with that, but I think it doesn't solve this problem: we likely still want to be able to do List(Float)
or perhaps Set(Float)
, and Float
also potentially does value-changing conversions (e.g., from integers).
I'm fairly sure we want to keep supporting conversions of integers to floats.
IOW, I think that proposal is orthogonal to this one.
we likely still want to be able to do List(Float) or perhaps Set(Float), and Float also potentially does value-changing conversions (e.g., from integers).
But Python handles that, so we should be okay?:
>>> x = [1.0, 2.0, 3.0]
>>> x.remove(1)
>>> x
[2.0, 3.0]
>>> x = set([1.0, 2.0, 3.0])
>>> x.remove(1)
>>> x
{2.0, 3.0}
Those are the cases where the value doesn't change (i.e., the transformed value is still ==
to the old one). Here's an example where it does:
>>> class A(HasTraits):
... foo = List(Float)
...
>>> x = 10**23
>>> a = A()
>>> a.foo.append(x)
>>> a.foo.remove(x) # expect this to work ...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mdickinson/Enthought/ETS/traits/traits/trait_list_object.py", line 716, in remove
super().remove(value)
File "/Users/mdickinson/Enthought/ETS/traits/traits/trait_list_object.py", line 395, in remove
super().remove(value)
ValueError: list.remove(x): x not in list
Here's one that I think is more plausible:
>>> from traits.api import *
>>> from pathlib import Path
>>> class A(HasTraits):
... files = List(File)
...
>>> p = Path("/Users/mdickinson/somefile.txt")
>>> a = A()
>>> a.files.append(p)
>>> a.files.remove(p)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mdickinson/Enthought/ETS/traits/traits/trait_list_object.py", line 716, in remove
super().remove(value)
File "/Users/mdickinson/Enthought/ETS/traits/traits/trait_list_object.py", line 395, in remove
super().remove(value)
ValueError: list.remove(x): x not in list
I am in favour of this proposal in its general sweep.
I think it should be explicitly documented that validation is expected to be idempotent up to equality (ie. validate(x) == validate(validate(x))
) independently of what we decide to do about when validation occurs. And, of course, make sure that existing validators do in fact follow the rule :)
Separately, I think for containers we should validate everything and then do comparison, as every situation where there may be a difference of behaviour seems to me like it is a bug - the current behaviour is almost certainly not what the person who wrote the code expects. I think the new TraitList
, TraitSet
objects should do this.
The case where the sort of exception returned changes does worry me a bit, and so there may need to be discussion about whether we want to catch and transform some of these errors. Eg. for a = Set(Int)
I would expect that either:
-
a.discard('foo')
does not raise anda.remove('foo')
raises a KeyError; or - both
a.discard('foo')
anda.remove('foo')
raise TraitError
I can see the argument for both, but I feel like the first may be needed simply for backwards compatibility reasons. I think the heuristic can be "if validate(x) raises, then x is not in the collection" and proceed as if that were the case (either raising KeyError, or computing the result). Things like symmetric difference get a bit gnarly, but I think can be resolved: a ^= {'foo'}
should raise a TraitError via the logic that 'foo' fails to validate so it is not in a
so the symmetric difference means that we need to add it to a
, but that fails the traits constraints, so it is a TraitError.
However we go, we should be able to document a simple logic that allows developers to understand and anticipate the behaviour.
I updated the title to clarify that this isn't just about TraitSet
.
Yes, I agree that a.discard("foo")
should maintain the promise that discard
never raises, mostly because we should maintain the equivalence between a LBYL check (if obj in my_set: my_set.remove(obj)
) and the equivalent my_set.discard(obj)
, and we definitely don't want the containment check to raise.
It's not 100% clear cut, though: if someone were attempting to discard something that's actually of the wrong type, then that's likely a coding error somewhere, but still.
BTW, this is tentatively in 6.1.0, partly just because I think we should discuss it before the release, but I don't think any changes necessarily have to happen in 6.1.0.
OTOH, I want to avoid the death-by-a-thousand-cuts situation where the behaviour changes in subtle ways multiple times across releases.
From my own experience of using traits, I would quite readily accept that this raises:
>>> class A(HasTraits):
... files = List(File)
...
>>> p = Path("/Users/mdickinson/somefile.txt")
>>> a = A()
>>> a.files.append(p)
>>> a.files.remove(p)
Because if I try to print a.files
, I can immediately see that the value in the list is a string, and I understand the transformation that happens at the time of insertion. In fact, most of the time I don't even remember the list on an instance of HasTraits
is not just an ordinary list. Doing normalization for __contains__
or similar logic will repaint that mental model.
most of the time I don't even remember the list on an instance of
HasTraits
is not just an ordinary list
I think that's exactly the problem, though: we should make it possible for users to not have to think about the specialness (especially since some of those users could be third-party code that thinks it's just handling a regular list). And a user would expect that after you append
something
to a list my_list
, then something in my_list
would return True
. At the moment, we break that invariant. I think this proposal brings TraitList
closer to list
, reduces the number of surprises like this, and makes it easier for people not to have to care that they have a TraitList
rather than a plain old list
.
And for sets in particular, there are going to be algorithms (e.g., graph traversal algorithms) that rely on x in s
being true after an s.add(x)
.
But as a user, I would also be surprised to see Path("/Users/mdickinson/somefile.txt") in a.files
returns True, while I clearly see a string in the list.
To be honest, I find the current behaviour easier to understand.
But most usage is going to be in code, not at the command line: there will be no-one to "see" a string in the list.
I think we have a genuine practical problem here. Suppose you have a List(File)
trait. Perhaps it's a list of DNA sequence files currently selected for processing; perhaps it's a list of files that are known to be open and need to be closed during cleanup.
Suppose that, for whatever reason, you're given a new file to add to that list, and that file comes in the form of a pathlib.Path
instance p
. At some point in your code, you've done myobj.dna_files.append(p)
(perhaps the user clicked on a corresponding checkbox, or selected it from a file dialog, or ...).
How do you then programmatically undo that operation, given that myobj.dna_files.remove(p)
won't work?
I think that surprise would also occur while this usage goes into code, and manual inspection is often the tool people would use for debugging. Let's say the file actually comes in the form of string and we have normalization for __contains__
:
>>> a.files = ["/Users/name/file"]
>>> Path("/Users/name/file") in a.files
True
>>> any(path == Path("/Users/name/file") for path in a.files)
False
Wouldn't that be surprising?
How do you then programmatically undo that operation, given that myobj.dna_files.remove(p) won't work?
Then I would need to convince traits not to mess with my data when they go in, maybe like this:
class A(HasTraits):
files = List(Union(Instance(Path), File))
Or even just this if I only need to deal with Path
.
class A(HasTraits):
files = List(Instance(Path))
Wouldn't that be surprising?
Yes, it would. I don't think anything's going to remove all the surprises. But I think we can rid of most this way, as well as achieve conceptual simplicity (making the behaviour fit better into people's brains).
I wouldn't want to end up recommending that people write List(Union(Instance(Path), File))
; it seems like a smell that such a complicated solution would be necessary. We want the obvious thing, List(File)
to just work, as far as that's possible.
Perhaps we should remove the understanding of path-like objects from File
? Or introduce CFile
for such things, and then disallow it in containers?
I am all in for not doing normalization at all for containers and recommend this pattern.
We can have a strict version for all the trait types that current does normalization, e.g. we can have StrictFile
that only does validation without messing with the value.
File
might need to stay the way it is for backward compatibility.
File might need to stay the way it is for backward compatibility.
~Oh no File
is not released yet. :)~ Nope, it has been there, I don't know what I am doing.
I am all in for not doing normalization at all for containers
I don't think that's an option, given that things like Int
, Float
and Bool
normalise.
Deferring the discussion to 6.2.0