penman icon indicating copy to clipboard operation
penman copied to clipboard

Order of triples

Open BramVanroy opened this issue 3 years ago • 3 comments

Hello there

Not so much an issue as a question: I was wondering how the triples are ordered. I assume they follow insertion order (not necessarily linguistic order) and are therefore depth-first but please correct me if I am wrong.

A nice addition (if not already possible) would be to be able to retrieve the triples (or nodes) as a depth-first or breadth-first iterable.

BramVanroy avatar Sep 24 '22 10:09 BramVanroy

Yes, they are in depth-first order on decoding, as you can see for these variations of the same graph:

>>> import penman
>>> penman.decode('(a / A :ARG0 (b / B :ARG0 (c / C)) :ARG1 c)').triples
[('a', ':instance', 'A'), ('a', ':ARG0', 'b'), ('b', ':instance', 'B'), ('b', ':ARG0', 'c'), ('c', ':instance', 'C'), ('a', ':ARG1', 'c')]
>>> penman.decode('(a / A :ARG0 (b / B :ARG0 c) :ARG1 (c / C))').triples
[('a', ':instance', 'A'), ('a', ':ARG0', 'b'), ('b', ':instance', 'B'), ('b', ':ARG0', 'c'), ('a', ':ARG1', 'c'), ('c', ':instance', 'C')]
>>> penman.decode('(a / A :ARG0 (b / B) :ARG1 (c / C :ARG0-of b))').triples
[('a', ':instance', 'A'), ('a', ':ARG0', 'b'), ('b', ':instance', 'B'), ('a', ':ARG1', 'c'), ('c', ':instance', 'C'), ('b', ':ARG0', 'c')]

The triple order does not matter in terms of graph equivalency:

>>> penman.decode('(a / A :ARG0 (b / B :ARG0 (c / C)) :ARG1 c)') == penman.decode('(a / A :ARG0 (b / B :ARG0 c) :ARG1 (c / C))')
True

The order (along with the epigraph) are merely what determines how the graph is configured as a tree, and thus how it is serialized.

Breadth-first traversal might be interesting, but is there a use case for this?

goodmami avatar Sep 24 '22 17:09 goodmami

I believe so. I was reading the SPRING paper of automatically generating text from AMR or parsing text to AMR. So in text-to-AMR, you have an input sentence and try to predict the linearized form of the graph with a language model. They try out different linearization approaches, including depth-first vs. breadth-first. Might be of interest to directly include that in penman (their library uses penman uinder the hood).

BramVanroy avatar Sep 26 '22 14:09 BramVanroy

I'm happy to review a PR that adds a function or method to iterate over the triples in other orders. We'd need to decide the API as well as the exact behavior, with tests.

As for the API, if your goal is just to get a tree or serialized PENMAN string in a different order, this could be accomplished with an appropriate key argument to penman.layout.reconfigure() or penman.layout.rearrange(). There's also penman.tree.walk(), which does a depth-first traversal of the tree, and I could imagine that taking an argument to select a different ordering. But it sounds like you want something that iterates over the triples of the graph, instead, in which case a new method on the penman.graph.Graph class might be better.

One thing to note is that while the graphs have one distinguished top node, they may have more than one top. E.g., for the dog that barked slept, both the sleep node and the bark node are top nodes:

(s / sleep
   :ARG0 (d / dog
      :ARG0-of (b / bark-01)))
graph TD;
  s-->|:ARG0|d;
  b-->|:ARG0|d;

In the graph, inverted triples are normalized (unless using something like the NoOpModel):

>>> import penman
>>> g = penman.decode('(d / dog :ARG0-of (b / bark-01))')
>>> g.triples
[('d', ':instance', 'dog'), ('b', ':ARG0', 'd'), ('b', ':instance', 'bark-01')] 

So it's not even accurate to say it's a depth-first traversal of the graph; rather, it's a traversal of the graph edges in the depth-first order of the tree. Furthermore, the graphs may have reentrancies and, while conventionally there are no directed cycles, the PENMAN format does not prevent this (e.g., (a / alpha :ARG0 a)). Given these facts, it would make more sense to implement the breadth-first traversal in the penman.tree.walk() method.

goodmami avatar Oct 08 '22 14:10 goodmami

Apologies for the delayed reply.

I have found that it is indeed "easier" to work with the a tree than with a graph, avoiding re-entries all together. I have not found the time to implement this but after having a quick look at _walk, it seems mostly straightforward. Hopefully I can make time for this later this year!

BramVanroy avatar Nov 03 '22 11:11 BramVanroy