Marek Horst
Marek Horst
Originally requested in redmine: [#5385](https://issue.openaire.research-infrastructures.eu/issues/5385). We should create a dedicated branch, define oozie workflow and integrate both database and script provided as redmine ticket attachment.
We should find the most convenient way to read bunch of zip files from HDFS (ideally straight from S3) and build avro datastore with `DocumentText` records holding all extracted NLMs.
Currently the only entities exported by IIS are patent and software entities. Both are the outcome of patent and software matching. Software entities are built based on the metadata encoded...
Some of the currently implemented caching solutions in spark, namely `CachedWebCrawlerJob` and `PatentMetadataRetrieverJob`, are relying on RDDs while we could take advantage of the full potential of spark2 dataframes as...
Since I was unable to find a decent solution to this problem in #987, where the proposed fix was just a workaround to make affiliation matching working again, we should...
Currently, according to [mapping spreadsheet](https://docs.google.com/spreadsheets/d/1iSLJeyltEjoqyUwtyw0eARmcCFejH2g8Lj1ltR9f5TU/edit#gid=0), both fields in `Patent` entity: * `dateofcollection` * `dateoftransformation` are set to the same static value provided as `export_patent_date_of_collection` parameter which is currently defined in...
This issue is related only to JSON report representation (the value is properly stored in avro reports and exported to prometheus) and is caused by the `import.concepts.duration` report entry existence...
Currently exporting phase is totally unaware of an algorithm status and whether given algorithm was enabled or disabled. Each algorithm produces an outcome regardless being disabled or enabled (empty outcome...
After replacing an old protbuf based Oaf model with the new dhp oaf model we could make one another step in further performance optimization. This optimization could be gained mostly...
This is a #1067 follow-up. Originally (long time ago) there was only one `DocumentToConceptId` schema definition located at: `eu.dnetlib.iis.referenceextraction.researchinitiative.schemas.DocumentToConceptId` used by researchinitiative reference extraction algorithm. At some point it was...