Marek Horst issues

Results 82 issues of


                                            Marek Horst

Rewrite citation matching algorithm in spark 2.4

Currently citation matching algorithm is written in spark 1.6, as a part of Coansys module: https://github.com/CeON/CoAnSys/tree/master/citation-matching/citation-matching-core-code We should rewrite the code in spark 2.4 (used by all the other spark...

functionality: citation-matching

Integrate the pre-registration text mining for the UKRN Pilot 6

Originally requested on redmine: https://support.openaire.eu/issues/9931#note-3

functionality: referenceextraction

Alter regex for UKRI subfunder in the buildprojectdb.sql script which is part of the reference extraction mining workflow

This task is basically about the incorporation of #1532 and updating the integration tests suite when needed. Originally requested in: https://support.openaire.eu/issues/10503#note-9.

functionality: referenceextraction

Integrate inference algorithm for RIF (previously RPF) with projects reference extraction

Since https://github.com/openaire/iis/pull/1036 lacks a corresponding issue and that PR aims at quite outdated codebase I am reviving this activity with a new issue and a dedicated branch. This thread was...

functionality: referenceextraction

Repartition affiliation matching final output before generating reports

According to the redmine ticket: https://support.openaire.eu/issues/10546 affiliation matching has failed during the very last phase of report generation due to to exceeding the predefined spark.network.timeout set to 600 seconds: ```...

functionality: affiliations

Introduce caching for the fuzzy citation matching algorithm

We should integrate caching mechanism in the fuzzy citation matching algorithm in order to reduce the amount of bibliographic references which are meant to be matched. We can do this...

functionality: citation-matching

Fix the loose organization name matching for the Italian university case

Originally reported on redmine: https://support.openaire.eu/issues/10396. It seems the most of the false positives are due to loose organization matching ("Università degli Studi di X") where most of the words of...

functionality: affiliations

Prepare metadata extraction cache remover workflow responsible for a pre-selected cache entries removal or update cache_builder worfklow to work in an "overwrite" mode

This is a direct follow-up of Grobid-based metadata extraction integration (#1512). Since rebuilding the whole metadata extraction cache from scratch is an extremely time consuming task we might need to...

functionality: metadataextraction

Perform EC ERASMUS+ mining test

Originally requested in: https://support.openaire.eu/issues/9631#note-7

functionality: referenceextraction

Run experiments with Grobid deployed as a server

We could start with dividing the set of required changes into sub-topics: * [DONE] client code responsible for communicating with Grobid server and sending PDF contents for parsing (this is...