graphtage icon indicating copy to clipboard operation
graphtage copied to clipboard

Investigate approximate matching

Open ESultanik opened this issue 5 years ago • 2 comments

Allow the user to specify a given epsilon of matching cost, and find a matching that is at most that epsilon from the cost of the optimal matching.

ESultanik avatar May 18 '20 15:05 ESultanik

How do you plan to make approximate matching work? User provides a function that takes two nodes and returns a "distance" factor?

I have a use case in mind for a tree of content nodes that have a special content_id attribute on them which I can use for exact matching, e.g. if nodeA.content_id == nodeB.content_id the match is 100% (or distance 0).

I posted some links about that here and looking forward to trying graphtage on the tree fixtures I have.

ivanistheone avatar Aug 29 '20 20:08 ivanistheone

Graphtage already has an internal notion of edit distance, which is what it uses to output its progress bar when run from a TTY. The idea would be to:

  1. allow the user to specify a maximum edit distance (defaulting to zero), and produce a result that is at most that distance from optimal; and/or
  2. immediately print out the best solution found thus far when Graphtage receives a SIGTERM.

It sounds like you might be trying to do something slightly different, though. You might actually be able to do what you want using Graphtage's not-yet-very-well-documented --match-if argument. If you can provide a more detailed example of input files, I could try and give you an example of its usage.

ESultanik avatar Sep 03 '20 01:09 ESultanik