Consider replacing compound match strength with the highest match strength for a group of voters
Affiliation matching algorithm is driven by the concept of matchers and voters. Each matcher consists of multiple voters, each having match strength assigned.
When multiple voters vote for a match we are recalculating the new match strength for a given affiliation and organization pair with the following formula:
https://github.com/openaire/iis/blob/871021ef3f34a6ae232e8dfb8c8a3579400cb0fa/iis-wf/iis-wf-affmatching/src/main/java/eu/dnetlib/iis/wf/affmatching/match/AffOrgMatchStrengthRecalculator.java#L39.
After countless analysis of the affmatching outcome whenever false positive was reported it was noticed the final match strength defined for a given affiliation and organization matched pair was almost always extremely high. This means it is pretty difficult to define a reasonable match strength threshold below which we could eliminate matches from exporting them back to the graph.
Currently some voters defined for a matcher are similar (e.g. strict matching and levenshtein distance matching) in a way they work on the same part of the affiliation organization name and building a compound match strength for that pair may result in an artificial increase of the match strength.
We should consider addressing this match strength value "inflation" by reducing the final match strength for a given pair whenever multiple voters voted for a match e.g. by picking the highest match strength of a voter claiming the match is valid.