Investigate approximate matching
Allow the user to specify a given epsilon of matching cost, and find a matching that is at most that epsilon from the cost of the optimal matching.
How do you plan to make approximate matching work? User provides a function that takes two nodes and returns a "distance" factor?
I have a use case in mind for a tree of content nodes that have a special content_id attribute on them which I can use for exact matching, e.g. if nodeA.content_id == nodeB.content_id the match is 100% (or distance 0).
I posted some links about that here and looking forward to trying graphtage on the tree fixtures I have.
Graphtage already has an internal notion of edit distance, which is what it uses to output its progress bar when run from a TTY. The idea would be to:
- allow the user to specify a maximum edit distance (defaulting to zero), and produce a result that is at most that distance from optimal; and/or
- immediately print out the best solution found thus far when Graphtage receives a SIGTERM.
It sounds like you might be trying to do something slightly different, though. You might actually be able to do what you want using Graphtage's not-yet-very-well-documented --match-if argument. If you can provide a more detailed example of input files, I could try and give you an example of its usage.