iis icon indicating copy to clipboard operation
iis copied to clipboard

Introduce caching for the fuzzy citation matching algorithm

Open marekhorst opened this issue 8 months ago • 0 comments

We should integrate caching mechanism in the fuzzy citation matching algorithm in order to reduce the amount of bibliographic references which are meant to be matched. We can do this by caching citation matching outcome and removing already matched entries from the input of the citation matching consecutive run.

We could also cache the unmatched ones knowing there was no match found for them. This should be done for older publications (older than 3/5 years, this could be parameterized) for which we are pretty confident new potential matches will not be introduced into the graph.

More details are available in the citation matching section of the IIS performance and stability optimization plan.

marekhorst avatar Apr 16 '25 11:04 marekhorst