Add containment for trees and graphs
Added simple containment __contains__ for graphs and trees. Tests are added, too.
For graphs:
str: whether a given string is part of the graph as a terminal instance- tuple (or Triple): whether it is part of self.triples
NotImplementedin other cases - so testing for subgraphs is not yet implement. Open to suggestions how to do that though. I was thinking of just doing triplet matching, but perhaps we want to be less strict and allow a match with different varnames as well?
For trees:
str: whether a given string is part of the tree as a terminal instance
I get the terminal nodes in the tree like this:
terminals = [target for _, (role, target) in self.walk() if role == "/" and is_atomic(target)]
which feels elaborate so perhaps there is a simpler way? Happy to change it if there is!
closes #10 (partially, at least, since subgraph testing is not included in this PR)
Failing lint: which command should I run? I see that flake8 is in the setup.py dependencies but I am not sure which arguments to run it with for your preferences.
Failing tests: seems unrelated to this PR (an error about py36 not available).
@goodmami I have considered your comments and I like the suggestion of just allowing for triple matching in the graphs only, and then also allowing users to specify "None" in a triple to ignore that field. Two questions/concerns:
- Is it ensured that None is never a valid value in a triple? Because if it is, we cannot use this approach (because users might want to explicitly match on None)
- In terms of typing, this would require a tiny additional change. Currently
Tripleonly allowsstras the Variable and Role and None is not valid. So for valid typing, this should change - or I change the signature ofcontainsso thatitemcan only be aTuple[Variable|None, Constant, Role|None].
What do you think?
PS: linter fails on files that I have not touched, so not sure how I can help that.
- Is it ensured that None is never a valid value in a triple? Because if it is, we cannot use this approach (because users might want to explicitly match on None)
None is a valid value, where it indicates something is missing (and Penman sometimes logs a warning in these cases):
>>> import penman
>>> penman.decode('()').triples
[(None, ':instance', None)]
>>> penman.decode('(a)').triples
[('a', ':instance', None)]
>>> penman.decode('(a :ARG0)').triples
Missing target: (a :ARG0)
[('a', ':instance', None), ('a', ':ARG0', None)]
I don't think roles are ever None, though, now that they are canonically written with the ::
>>> penman.decode('(a :)').triples
Missing target: (a :)
[('a', ':instance', None), ('a', ':', None)]
So, yes, it's true that you cannot use the graph methods like Graph.edges() to look for those that have a literal None value because it is overloaded in these methods to be a wildcard. The current workaround is to look through Graph.triples manually. We could use a different sentinel value for the wildcard, such as Ellipsis, but it would be a backward incompatible change to make None function as a literal value, so there would need to be a deprecation period. I think the workaround suffices for now.
- In terms of typing, this would require a tiny additional change. Currently Triple only allows str as the Variable and Role and None is not valid. So for valid typing, this should change - or I change the signature of contains so that item can only be a Tuple[Variable|None, Constant, Role|None].
I think you're right. Let's hold off on changing Role to allow for None since I don't think that's possible (correct me if I'm wrong). But for the triple source, penman.types.BasicTriple also needs to be similarly changed.
@BramVanroy sorry if you were waiting on something from me. I thought there were some unresolved requests, such as making a dedicated method for containment checks instead of __contains__(). In any case, since the checks are fairly basic, I think existing mechanisms make them quite simple:
>>> import penman
>>> g = penman.decode("(a / A :ARG0 (b / B))")
>>> ("a", ":ARG0", "b") in g.triples # triple in graph
True
>>> "B" in {t.target for t in g.instances()} # set of node labels
True
Hi @goodmami. Sorry I do not have time to work on this anymore since I am not actively working with penman-like trees anymore. I was cleaning up my repositories and so also deleted my fork of penman. As you mention, checking for containment is relatively easy as-is already!
@BramVanroy got it. And thanks for your contributions!