iis icon indicating copy to clipboard operation
iis copied to clipboard

Rewrite citation matching algorithm in spark 2.4

Open marekhorst opened this issue 1 year ago • 0 comments

Currently citation matching algorithm is written in spark 1.6, as a part of Coansys module:

https://github.com/CeON/CoAnSys/tree/master/citation-matching/citation-matching-core-code

We should rewrite the code in spark 2.4 (used by all the other spark modules in IIS) in order to be able to set timeouts (such as spark.shuffle.registration.timeout which is available since spark 2.3) and to take advantage of performance improvements.

Currently the citation matching algorithm cannot be run on the current size of the graph due to shuffle server related timeouts.

marekhorst avatar Sep 25 '24 14:09 marekhorst