Marek Horst

Results 82 issues of Marek Horst

Currently the information generated by the `AffOrgMatchVoterStrengthEstimatorAndTest` is logged on the `TRACE` level which is not being logged by default. This makes working on reestimation of the voters' strength rather...

functionality: affiliations

It turned out if the first attempt of the `CachedWebCrawlerJob` failed due to shuffle service connectivity issue: ``` 2024-11-16 22:38:14,911 [shuffle-client-6-1] ERROR org.apache.spark.network.client.TransportResponseHandler - Still have 1 requests outstanding when...

functionality: referenceextraction

This could be considered as #1475 follow-up because citation matching was the last module written in Spark 1.6. The following properties defined in workflow.xml files: ``` spark2ExtraListeners com.cloudera.spark.lineage.NavigatorAppListener spark 2.*...

Originally requested in: https://support.openaire.eu/issues/10757. The goal is to integrate the Data Availability Statement (DAS) text-mining module for the Uppsala (SciLifeLab) tender.

functionality: referenceextraction

TEI record produced by Grobid includes, apart from the publication metadata, also the version of Grobid responsible for creation of a given TEI XML record: ``` GROBID - A machine...

functionality: metadataextraction

During the extensive tests it turned out all the Grobid communication related errors are stored as `Fault`s in cache what makes given PDF extracted empty metadata to be permanently stored...

functionality: metadataextraction

Currently the `TeiToExtractedDocumentMetadataTransformer`, working on top of the Grobid TEI XML output, parses the authors defined in the bibliographic reference section by traversing the XML author subelement: ``` Biosynthesis of...

functionality: metadataextraction

Since context profiles were removed from the D-Net Information System we can completely remove the legacy ISLookup based concepts importer and make the newly introduced streaming API based importer a...

This is a #1560 follow-up. Grobid-based metadata extraction needs to set an appropriate `extractedBy` field value. Important remark: exception handling for Grobid-based metadata extraction results in setting an empty record....

functionality: metadataextraction

Originally requested in redmine: https://support.openaire.eu/issues/9871#note-10 The idea is to implement and integrate a workflow responsible for: * reading HTML landing pages from tar.gz packages stored by the PDF Aggregation System...

activity: impl