beam icon indicating copy to clipboard operation
beam copied to clipboard

Kececioglu matcher

Open bw800402 opened this issue 5 years ago • 3 comments

KececiogluMatching is a depth-first implementation of Edmond's maximum matching algorithm with early-termination test heuristic as described in "Computing maximum-cardinality matchings in sparse general graphs." Proceedings of the 2nd Workshop on Algorithm Engineering, 121-132, 1998. https://www2.cs.arizona.edu/~kece/Research/abstracts.html#KP98 and https://www2.cs.arizona.edu/~kece/Research/papers/KP98.ps

The hope was that this algorithm might improve the speed of matching for chemical graphs despite their low degree. Thus it was designed to act as a drop in replacement for the MaximumMatching class. However, on my system the algorithm performs nearly identically in terms of execution speed as the MaximumMatcher adapted from Eppstein when run against the 5616 ChEMBL smiles that require the MaximumMatching algorithm. Of the 5616 smiles in ChEMBL requiring the algorithm, 256 result in a different matching when using KececiogluMatching and manual inspection of a random subset of those indicates that they are valid resonance forms. Given the lack of performance gain, I did not incorporate the KececiogluMatching into Localise, but the code is written and could be useful to someone and maybe there is performance yet to squeeze from it.

I created new tests that mimic MaximumMatchingTest and all pass.

bw800402 avatar Jun 18 '20 21:06 bw800402

Need some time to review but looks very interesting, thanks for contributing. The minimal speed gain is likely because of the approximate matching first and this covers almost all cases. In fact in Beam v2 (which will be out at some point) I just use an exhaustive search to clean up the Kekulization/Localisation of bonds since often the greedy matching will find a solution with only two unmatched nodes - in which case the task is just to see if there is a path between them.

From my thesis:

image

johnmay avatar Jun 19 '20 07:06 johnmay

It's not obvious from the links I provided but I was essentially basing this implementation on the prior C implementation described in the paper which was provided by the author here: https://www2.cs.arizona.edu/~kece/Research/code/matching1.sh.Z

bw800402 avatar Jun 22 '20 18:06 bw800402

Sorry been busy so only just looked at this, could you add a copyright/license header to the new files. Also I would add a comment pointing to the C implementation.

johnmay avatar Jul 04 '20 09:07 johnmay