Marek Horst
Marek Horst
Currently the information generated by the `AffOrgMatchVoterStrengthEstimatorAndTest` is logged on the `TRACE` level which is not being logged by default. This makes working on reestimation of the voters' strength rather...
It turned out if the first attempt of the `CachedWebCrawlerJob` failed due to shuffle service connectivity issue: ``` 2024-11-16 22:38:14,911 [shuffle-client-6-1] ERROR org.apache.spark.network.client.TransportResponseHandler - Still have 1 requests outstanding when...
This could be considered as #1475 follow-up because citation matching was the last module written in Spark 1.6. The following properties defined in workflow.xml files: ``` spark2ExtraListeners com.cloudera.spark.lineage.NavigatorAppListener spark 2.*...
Originally requested in: https://support.openaire.eu/issues/10757. The goal is to integrate the Data Availability Statement (DAS) text-mining module for the Uppsala (SciLifeLab) tender.
TEI record produced by Grobid includes, apart from the publication metadata, also the version of Grobid responsible for creation of a given TEI XML record: ``` GROBID - A machine...
Avoid placing temporary errors related to communication with Grobid as permanent faults in the cache
During the extensive tests it turned out all the Grobid communication related errors are stored as `Fault`s in cache what makes given PDF extracted empty metadata to be permanently stored...
Currently the `TeiToExtractedDocumentMetadataTransformer`, working on top of the Grobid TEI XML output, parses the authors defined in the bibliographic reference section by traversing the XML author subelement: ``` Biosynthesis of...
Since context profiles were removed from the D-Net Information System we can completely remove the legacy ISLookup based concepts importer and make the newly introduced streaming API based importer a...
This is a #1560 follow-up. Grobid-based metadata extraction needs to set an appropriate `extractedBy` field value. Important remark: exception handling for Grobid-based metadata extraction results in setting an empty record....
Originally requested in redmine: https://support.openaire.eu/issues/9871#note-10 The idea is to implement and integrate a workflow responsible for: * reading HTML landing pages from tar.gz packages stored by the PDF Aggregation System...