Marek Horst issues

Results 82 issues of


                                            Marek Horst

Perform test mining for EOSC services

Originally requested in redmine: [#5385](https://issue.openaire.research-infrastructures.eu/issues/5385). We should create a dedicated branch, define oozie workflow and integrate both database and script provided as redmine ticket attachment.

activity: explore

Extract plaintexts from NLM records provided by SN as zip packages

We should find the most convenient way to read bunch of zip files from HDFS (ideally straight from S3) and build avro datastore with `DocumentText` records holding all extracted NLMs.

activity: explore

Avoid introducing patent and software duplicates when exporting entities

Currently the only entities exported by IIS are patent and software entities. Both are the outcome of patent and software matching. Software entities are built based on the metadata encoded...

activity: impl

functionality: export

Align all caching modules implemented in spark to rely on dataframes

Some of the currently implemented caching solutions in spark, namely `CachedWebCrawlerJob` and `PatentMetadataRetrieverJob`, are relying on RDDs while we could take advantage of the full potential of spark2 dataframes as...

activity: refactor

functionality: core

Find a proper way of dealing with sharelib jars conflicting with user jars

Since I was unable to find a decent solution to this problem in #987, where the proposed fix was just a workaround to make affiliation matching working again, we should...

activity: explore

functionality: core

Change the way dateOfCollection is defined for patent entities

Currently, according to [mapping spreadsheet](https://docs.google.com/spreadsheets/d/1iSLJeyltEjoqyUwtyw0eARmcCFejH2g8Lj1ltR9f5TU/edit#gid=0), both fields in `Patent` entity: * `dateofcollection` * `dateoftransformation` are set to the same static value provided as `export_patent_date_of_collection` parameter which is currently defined in...

activity: impl

functionality: referenceextraction

functionality: export

import.concepts report entry is missing in JSON report

This issue is related only to JSON report representation (the value is properly stored in avro reports and exported to prometheus) and is caused by the `import.concepts.duration` report entry existence...

activity: tiny bug

functionality: execution reports

Skip exporting mining outcome for disabled algorithms

Currently exporting phase is totally unaware of an algorithm status and whether given algorithm was enabled or disabled. Each algorithm produces an outcome regardless being disabled or enabled (empty outcome...

activity: impl

functionality: export

Consider replacing Result entities exported as actions payload with concrete entity types

After replacing an old protbuf based Oaf model with the new dhp oaf model we could make one another step in further performance optimization. This optimization could be gained mostly...

activity: impl

functionality: export

Align all the DocumentToConceptId references to a single schema usage

This is a #1067 follow-up. Originally (long time ago) there was only one `DocumentToConceptId` schema definition located at: `eu.dnetlib.iis.referenceextraction.researchinitiative.schemas.DocumentToConceptId` used by researchinitiative reference extraction algorithm. At some point it was...

functionality: referenceextraction