udata icon indicating copy to clipboard operation
udata copied to clipboard

Feat/use dynamic harvest field

Open maudetes opened this issue 2 years ago • 0 comments

Fix https://github.com/etalab/data.gouv.fr/issues/818, alternative to https://github.com/opendatateam/udata/pull/2750

Uses a separate harvest dynamic document to store harvest information. The core fields are defined in /dataset/models.py. Any entry can be added freely, without validation however.

Explicit api field definition is needed for core or any additional fields to expose it by api. See https://github.com/maudetes/udata/blob/67dfdda59750eb4224c07b1da4c792f41147a26f/udata/core/dataset/api_fields.py#L22 for fields defined in udata core. Other entries would be added by modifying this field definition, ex in udata-ods:

from udata.api import fields
from udata.core.dataset.api_fields import dataset_harvest_fields

dataset_harvest_fields['ods_url'] = fields.String(description='The ods url for ods harvested dataset', allow_null=True)

Harvest dates are now stored in the harvest metadata and don't override the object dates. Thus, we should iterate to return the correct dates on the frontend (ex: max between mongo object & harvest metadata?).

TODO

  • [x] migration -> only identifying fields. Takes about 10min locally
  • [ ] Some defined fields could be replaced or merged? Ex: dct_identifier is the same as remote_id for dcat harvested datasets. Are these values needed? Made a first attempt at removing those: https://github.com/maudetes/udata/commit/a5a6d97c9bcfacd76d0fb70bc681b00de8bfd005.

maudetes avatar Aug 22 '22 14:08 maudetes