zenodo-rdm icon indicating copy to clipboard operation
zenodo-rdm copied to clipboard

search-parsing: migrate missing redirections

Open zzacharo opened this issue 2 years ago • 1 comments

Extracted from https://github.com/zenodo/zenodo-rdm/issues/102

The following terms were not migrated. A small explanation can be found in comments.

[
  "contributors.affiliation",# The mapping did not not work since 'metadata.contributors.affiliations.name' yields a field not found when parsing the query. However it works in ES though.
  "contributors.name",# Same as above with the mapping metadata.contributors.person_or_org.name.
  "contributors.type",# Same as above with the mapping metadata.contributors.person_or_org.role.id
  "creators.name",# Same as above with the mapping meatadata.creators.person_or_org.name
  "contributors.orcid",# I am not sure what to map here, given that this only applies to records which contributors "identifiers.scheme == 'orcid'"
  "creators.orcid",# same as above
  "filecount",# record files are under a new link, I am not sure we can query records and retrieve the number of files
  "filename",# same as above
  "filetype",# same as above
  "size",# seems to relate to files that are under a new link
  "grants.acronym",# The mapping did not work since metadata.funding.award.id yields a field not found.
  "imprint.\*",# I think it's not used anymore or relates to a type of resource
  "imprint.place",# same as above
  "imprint.publisher",# same as above
  "isbn",# same as above
  "journal.\*",# same as above
  "journal.issue",# same as above
  "journal.pages",# same as above
  "journal.title",# same as above
  "journal.volume",# same as above
  "journal.year",# same as above
  "relations.version.count",# versions is a new link now records/<recid>/versions
  "meeting.\*",# not used anymore
  "meeting.acronym",# not used anymore
  "meeting.dates",# not used anymore
  "meeting.place",# not used anymore
  "meeting.session_part",# not used anymore
  "meeting.session",# not used anymore
  "meeting.title",# not used anymore
  "meeting.url",# not used anymore
  "part_of.\*",# not used anymore
  "part_of.pages",# not used anymore
  "part_of.title",# not used anymore
  "references.\*",# not used anymore
  "Resource ",# not used anymore
  "notes",# does not exist anymore ?
  "subject.term",# I am not sure what's the equivalent mapping. Perhaps subject?
  "subject.identifier",# I am not sure what's the equivalent mapping. Perhaps subject?
  "owners",# I am not sure what's the equivalent mapping here. Perhaps creators ?
  "access_conditions",  # does not match to any term in rdm, comments are only allowed for embargoed records
  "grants.program",   # should be 'metadata.funding.award.program' does not exist
  "grants.funder.acronyms",  # should be 0metadata.funding.award.acronym' does not exist

]
  • [ ] To evaluate if the mechanism needed for the above impacts the current query parsing mechanism in RDM and an enhanced parsing should be implemented.

see comment: https://github.com/zenodo/zenodo-rdm/issues/102#issuecomment-1352886639

"access_conditions", # 'access_conditions' does not exist anymore. It was a description on why the record was restricted. Only availabe for embargoed records.

Agree, the most fitting field would be "embargo reason", but it doesn't match 100%. This has to do with what do we decide about the migration (or not) of the "restricted access" records (i.e. the feature where a user can request access on a closed access record with some conditions).

I would remove it completely and move over to the "failed migrations" so we address later.

"access_right", # maps to "access.status". However access.status does not work (e.g. query by 'access.status:"open"' yields nothing) and the redirection works (e.g. 'access_right:"open"' yields records).

For the migrated records on ZenodoRDM we don't have many different cases of access.status, but the fields mapping is correct.

What needs some tuning is trranslating the actual search values:

  • (Legacy: RDM)
  • open: open
  • embargoed: embargoed
  • restricted: restricted (this doesn't mean the same though... see the point about access_condiations)
  • closed: restricted 👈❗
  • N/A: metadata-only

Probably that's a new functionality for the query-parser though, since it's about changing with the original search values (not just fields/terms).

"grants.program", # "metadata.funding.id": validate whether it's the correct mapping

This incorrect, but this information is also not availbale on the metadata of awards currently (should be something like metadata.funding.award.program).

"grants.title", # "metadata.funding.award.title.en" : validate whether it's the correct mapping

Yes, that looks correct.

"grants.funder.acronyms", # "metadata.funding.funder.id" : validate whether it's the correct mapping

This should be metadata.funding.award.acronym, but the field is missing from the relation configuration and the record mapping.

I'm a bit concerned also that the legacy field is .acronyms (plural) and in RDM we have .acronym (singular). Probably it's good to assume an array (since it also doesn't make a difference for the mapping).

"grants.funder.doi", # "metadata.funding.id" : validate whether it's the correct mapping

This one can be metadata.funding.funder.identifiers.identifier (since the funder DOI is included there).

"license.license", # "metadata.rights.title.en" : validate whether it's the correct mapping

Funnily, there is no such field in the actual legacy API mapping 🙃

So no need to map (we should fix that in the legacy search guide).

zzacharo avatar Dec 15 '22 09:12 zzacharo

See https://github.com/zenodo/zenodo-rdm/issues/102#issuecomment-1352886639 for further information on missing redirections

alejandromumo avatar Dec 16 '22 09:12 alejandromumo