Marek Horst

Results 82 issues of Marek Horst

Currently top level `mapredChildJavaOpts` value (e.g. defined at `document-similarity-oap-uberworkflow` workflow level) is propagated deep down to all subworkflows and all PIG scripts. Does it mean all the subworkflows and scripts...

activity: concept

This task is meant to revert #1454 change once the new version of `dhp-commons` is released (including `IdentifierFactory` and dependant classes).

We should consider reducing the number of files produced by the `metadataextraction` job at the cost of extending the execution time of a single `metadataextraction` task attempt. One of the...

functionality: metadataextraction

Since we were given ~elsevier~ springer contents (provided in JATS format) we should extend already existing PMC ingester module (JATS compatible) and make `ArticleMetaXmlHandler` capable of handling more metadata fields...

functionality: import

Originally requested in: https://support.openaire.eu/issues/8896#note-98 This parser should be responsible for: * assiging proper identifier (instead of currently used file name which is not unique) * extract text out of the...

This is a #1434 follow-up. We already expose the number of imported contexts, as a counter exported to prometheus, but we might also want to export the number of missing...

functionality: import

Even though we have already introduced `@SlowTest` marking there are still some long lasting tests executed during the regular packaging phase, namely: * `eu.dnetlib.iis.wf.importer.infospace.ImportInformationSpaceJobTest` (87 secs) * `eu.dnetlib.iis.wf.importer.infospace.ImportInformationSpaceJobUtilsTest` (7 secs)...

This is #1434 follow up. It was planned to support both concept importing modes by IIS for the time being: * ISLookup based * Context Streaming endpoint based until both...

functionality: import

This task is related to running a subset of IIS modules currently written in spark 2.4 on the newly available spark 3.4 version. This may require: * altering oozie workflow...

Cache builder's main purpose is to allow running metadata extraction, out of the regular provisioning cycle, on a predefined set of contents. Since PDF aggregation system replaced ObjectStore as a...

functionality: metadataextraction