iis icon indicating copy to clipboard operation
iis copied to clipboard

Change the way dateOfCollection is defined for patent entities

Open marekhorst opened this issue 5 years ago • 0 comments

Currently, according to mapping spreadsheet, both fields in Patent entity:

  • dateofcollection
  • dateoftransformation

are set to the same static value provided as export_patent_date_of_collection parameter which is currently defined in config-default.xml file.

This approach originated in the times where all patent details were provided at IIS input as JSON file stored on HDFS.

After reimplementing patents mining in #1070 and relying on TSV dump (providing patent identifiers only) we might want to change the way we assign collection/transformation dates to the records because each patent record metadata is retrieved from EPO endpoint during IIS mining (and being cached once the caching is introduced).

We should decide whether dateofcollection should be bound to the:

  1. input TSV dump creation time (this would happen twice a year)
  2. EPO endpoint retrieval time

marekhorst avatar Jun 17 '20 16:06 marekhorst