Marek Horst
Marek Horst
Currently citation matching algorithm is written in spark 1.6, as a part of Coansys module: https://github.com/CeON/CoAnSys/tree/master/citation-matching/citation-matching-core-code We should rewrite the code in spark 2.4 (used by all the other spark...
Originally requested on redmine: https://support.openaire.eu/issues/9931#note-3
This task is basically about the incorporation of #1532 and updating the integration tests suite when needed. Originally requested in: https://support.openaire.eu/issues/10503#note-9.
Since https://github.com/openaire/iis/pull/1036 lacks a corresponding issue and that PR aims at quite outdated codebase I am reviving this activity with a new issue and a dedicated branch. This thread was...
According to the redmine ticket: https://support.openaire.eu/issues/10546 affiliation matching has failed during the very last phase of report generation due to to exceeding the predefined spark.network.timeout set to 600 seconds: ```...
We should integrate caching mechanism in the fuzzy citation matching algorithm in order to reduce the amount of bibliographic references which are meant to be matched. We can do this...
Originally reported on redmine: https://support.openaire.eu/issues/10396. It seems the most of the false positives are due to loose organization matching ("Università degli Studi di X") where most of the words of...
This is a direct follow-up of Grobid-based metadata extraction integration (#1512). Since rebuilding the whole metadata extraction cache from scratch is an extremely time consuming task we might need to...
Originally requested in: https://support.openaire.eu/issues/9631#note-7
We could start with dividing the set of required changes into sub-topics: * [DONE] client code responsible for communicating with Grobid server and sending PDF contents for parsing (this is...