Marek Horst

Results 82 issues of Marek Horst

Originally requested in redmine: [#5385](https://issue.openaire.research-infrastructures.eu/issues/5385). We should create a dedicated branch, define oozie workflow and integrate both database and script provided as redmine ticket attachment.

activity: explore

We should find the most convenient way to read bunch of zip files from HDFS (ideally straight from S3) and build avro datastore with `DocumentText` records holding all extracted NLMs.

activity: explore

Currently the only entities exported by IIS are patent and software entities. Both are the outcome of patent and software matching. Software entities are built based on the metadata encoded...

activity: impl
functionality: export

Some of the currently implemented caching solutions in spark, namely `CachedWebCrawlerJob` and `PatentMetadataRetrieverJob`, are relying on RDDs while we could take advantage of the full potential of spark2 dataframes as...

activity: refactor
functionality: core

Since I was unable to find a decent solution to this problem in #987, where the proposed fix was just a workaround to make affiliation matching working again, we should...

activity: explore
functionality: core

Currently, according to [mapping spreadsheet](https://docs.google.com/spreadsheets/d/1iSLJeyltEjoqyUwtyw0eARmcCFejH2g8Lj1ltR9f5TU/edit#gid=0), both fields in `Patent` entity: * `dateofcollection` * `dateoftransformation` are set to the same static value provided as `export_patent_date_of_collection` parameter which is currently defined in...

activity: impl
functionality: referenceextraction
functionality: export

This issue is related only to JSON report representation (the value is properly stored in avro reports and exported to prometheus) and is caused by the `import.concepts.duration` report entry existence...

activity: tiny bug
functionality: execution reports

Currently exporting phase is totally unaware of an algorithm status and whether given algorithm was enabled or disabled. Each algorithm produces an outcome regardless being disabled or enabled (empty outcome...

activity: impl
functionality: export

After replacing an old protbuf based Oaf model with the new dhp oaf model we could make one another step in further performance optimization. This optimization could be gained mostly...

activity: impl
functionality: export

This is a #1067 follow-up. Originally (long time ago) there was only one `DocumentToConceptId` schema definition located at: `eu.dnetlib.iis.referenceextraction.researchinitiative.schemas.DocumentToConceptId` used by researchinitiative reference extraction algorithm. At some point it was...

functionality: referenceextraction