Marek Horst

Results 82 issues of Marek Horst

Currently citation matching algorithm is written in spark 1.6, as a part of Coansys module: https://github.com/CeON/CoAnSys/tree/master/citation-matching/citation-matching-core-code We should rewrite the code in spark 2.4 (used by all the other spark...

functionality: citation-matching

Originally requested on redmine: https://support.openaire.eu/issues/9931#note-3

functionality: referenceextraction

This task is basically about the incorporation of #1532 and updating the integration tests suite when needed. Originally requested in: https://support.openaire.eu/issues/10503#note-9.

functionality: referenceextraction

Since https://github.com/openaire/iis/pull/1036 lacks a corresponding issue and that PR aims at quite outdated codebase I am reviving this activity with a new issue and a dedicated branch. This thread was...

functionality: referenceextraction

According to the redmine ticket: https://support.openaire.eu/issues/10546 affiliation matching has failed during the very last phase of report generation due to to exceeding the predefined spark.network.timeout set to 600 seconds: ```...

functionality: affiliations

We should integrate caching mechanism in the fuzzy citation matching algorithm in order to reduce the amount of bibliographic references which are meant to be matched. We can do this...

functionality: citation-matching

Originally reported on redmine: https://support.openaire.eu/issues/10396. It seems the most of the false positives are due to loose organization matching ("Università degli Studi di X") where most of the words of...

functionality: affiliations

This is a direct follow-up of Grobid-based metadata extraction integration (#1512). Since rebuilding the whole metadata extraction cache from scratch is an extremely time consuming task we might need to...

functionality: metadataextraction

Originally requested in: https://support.openaire.eu/issues/9631#note-7

functionality: referenceextraction

We could start with dividing the set of required changes into sub-topics: * [DONE] client code responsible for communicating with Grobid server and sending PDF contents for parsing (this is...