iis icon indicating copy to clipboard operation
iis copied to clipboard

Consider replacing Result entities exported as actions payload with concrete entity types

Open marekhorst opened this issue 5 years ago • 0 comments

After replacing an old protbuf based Oaf model with the new dhp oaf model we could make one another step in further performance optimization.

This optimization could be gained mostly within action manager subsystem applying actions emitted by IIS (among other producers) on materialized graph.

Apart from relations (exported as Relation class) only the following newly created entities are emitted with concrete entity type:

  • matched patents as eu.dnetlib.dhp.schema.oaf.Publication
  • matched software entities as eu.dnetlib.dhp.schema.oaf.Software

All the other actions, conveying entity payload with updates for existing entities, generated by the following mining algorithms:

  • citations matching (encoded as extra info with XML payload)
  • documents classification (encoded as subject)
  • pdb mining (encoded as external reference)
  • covid-19 reference extraction (encoded as context)
  • concepts matching (encoded as context)
  • communities reference extraction (encoded as context)
  • research initiative reference extraction (encoded as context)

are exported as eu.dnetlib.dhp.schema.oaf.Result entities.

Even though this is allowed it may affect the action manager performance having to apply those actions on each concrete type: Publication, Software, Dataset because the real type of Result is unknown and each concrete type is stored separately on HDFS.

Since those are the updates to the existing entities we could preserve the imported concrete entity type somewhere in IIS model and rely on this type when producing the concrete entity instance instead of a pretty abstract Result type.

marekhorst avatar May 15 '20 11:05 marekhorst