grammaregex icon indicating copy to clipboard operation
grammaregex copied to clipboard

Retrieve tokens on the path to a matching

Open mpomarlan opened this issue 5 years ago • 1 comments

It's often interesting to know not only the leaf token as selected by a pattern, but also some of the intermediate steps. Python's re package even provides functionality-- named groups-- to identify particular parts of a matching that may be interesting separately.

An example of how this might look like for grammaregex would be:

example sentence: "Mrs. Robinson graduated from the Wharton School of the University of Pennsylvania in 1980." pattern: ?PVBD/prep/?PIN/pobj/?P* matchings: [{"root": "graduated", "prep": "from", "where": "School"}, {"root": "graduated", "prep": "in", "where": "1980"}]

An example implementation of such behavior (with backwards compatibility: if no ?P<> appears in the pattern to match, just return tokens as before) can be found at this branch.

mpomarlan avatar Jul 26 '19 13:07 mpomarlan

meant to say,

pattern: ?P<root>VBD/prep/?P<prep>IN/pobj/?P<where>*

mpomarlan avatar Jul 26 '19 13:07 mpomarlan