penman icon indicating copy to clipboard operation
penman copied to clipboard

Add containment for trees and graphs

Open BramVanroy opened this issue 2 years ago • 3 comments

Added simple containment __contains__ for graphs and trees. Tests are added, too.

For graphs:

  • str: whether a given string is part of the graph as a terminal instance
  • tuple (or Triple): whether it is part of self.triples
  • NotImplemented in other cases - so testing for subgraphs is not yet implement. Open to suggestions how to do that though. I was thinking of just doing triplet matching, but perhaps we want to be less strict and allow a match with different varnames as well?

For trees:

  • str: whether a given string is part of the tree as a terminal instance

I get the terminal nodes in the tree like this:

terminals = [target for _, (role, target) in self.walk() if role == "/" and is_atomic(target)]

which feels elaborate so perhaps there is a simpler way? Happy to change it if there is!

closes #10 (partially, at least, since subgraph testing is not included in this PR)

BramVanroy avatar Aug 07 '23 07:08 BramVanroy

Failing lint: which command should I run? I see that flake8 is in the setup.py dependencies but I am not sure which arguments to run it with for your preferences.

Failing tests: seems unrelated to this PR (an error about py36 not available).

BramVanroy avatar Aug 07 '23 07:08 BramVanroy

@goodmami I have considered your comments and I like the suggestion of just allowing for triple matching in the graphs only, and then also allowing users to specify "None" in a triple to ignore that field. Two questions/concerns:

  • Is it ensured that None is never a valid value in a triple? Because if it is, we cannot use this approach (because users might want to explicitly match on None)
  • In terms of typing, this would require a tiny additional change. Currently Triple only allows str as the Variable and Role and None is not valid. So for valid typing, this should change - or I change the signature of contains so that item can only be a Tuple[Variable|None, Constant, Role|None].

What do you think?

PS: linter fails on files that I have not touched, so not sure how I can help that.

BramVanroy avatar Oct 23 '23 13:10 BramVanroy

  • Is it ensured that None is never a valid value in a triple? Because if it is, we cannot use this approach (because users might want to explicitly match on None)

None is a valid value, where it indicates something is missing (and Penman sometimes logs a warning in these cases):

>>> import penman
>>> penman.decode('()').triples
[(None, ':instance', None)]
>>> penman.decode('(a)').triples
[('a', ':instance', None)]
>>> penman.decode('(a :ARG0)').triples
Missing target: (a :ARG0)
[('a', ':instance', None), ('a', ':ARG0', None)]

I don't think roles are ever None, though, now that they are canonically written with the ::

>>> penman.decode('(a :)').triples
Missing target: (a :)
[('a', ':instance', None), ('a', ':', None)]

So, yes, it's true that you cannot use the graph methods like Graph.edges() to look for those that have a literal None value because it is overloaded in these methods to be a wildcard. The current workaround is to look through Graph.triples manually. We could use a different sentinel value for the wildcard, such as Ellipsis, but it would be a backward incompatible change to make None function as a literal value, so there would need to be a deprecation period. I think the workaround suffices for now.

  • In terms of typing, this would require a tiny additional change. Currently Triple only allows str as the Variable and Role and None is not valid. So for valid typing, this should change - or I change the signature of contains so that item can only be a Tuple[Variable|None, Constant, Role|None].

I think you're right. Let's hold off on changing Role to allow for None since I don't think that's possible (correct me if I'm wrong). But for the triple source, penman.types.BasicTriple also needs to be similarly changed.

goodmami avatar Oct 24 '23 04:10 goodmami

@BramVanroy sorry if you were waiting on something from me. I thought there were some unresolved requests, such as making a dedicated method for containment checks instead of __contains__(). In any case, since the checks are fairly basic, I think existing mechanisms make them quite simple:

>>> import penman
>>> g = penman.decode("(a / A :ARG0 (b / B))")
>>> ("a", ":ARG0", "b") in g.triples  # triple in graph
True
>>> "B" in {t.target for t in g.instances()}  # set of node labels
True

goodmami avatar May 15 '25 16:05 goodmami

Hi @goodmami. Sorry I do not have time to work on this anymore since I am not actively working with penman-like trees anymore. I was cleaning up my repositories and so also deleted my fork of penman. As you mention, checking for containment is relatively easy as-is already!

BramVanroy avatar May 16 '25 06:05 BramVanroy

@BramVanroy got it. And thanks for your contributions!

goodmami avatar May 16 '25 16:05 goodmami