zenodo-rdm
zenodo-rdm copied to clipboard
search-parsing: migrate missing redirections
Extracted from https://github.com/zenodo/zenodo-rdm/issues/102
The following terms were not migrated. A small explanation can be found in comments.
[
"contributors.affiliation",# The mapping did not not work since 'metadata.contributors.affiliations.name' yields a field not found when parsing the query. However it works in ES though.
"contributors.name",# Same as above with the mapping metadata.contributors.person_or_org.name.
"contributors.type",# Same as above with the mapping metadata.contributors.person_or_org.role.id
"creators.name",# Same as above with the mapping meatadata.creators.person_or_org.name
"contributors.orcid",# I am not sure what to map here, given that this only applies to records which contributors "identifiers.scheme == 'orcid'"
"creators.orcid",# same as above
"filecount",# record files are under a new link, I am not sure we can query records and retrieve the number of files
"filename",# same as above
"filetype",# same as above
"size",# seems to relate to files that are under a new link
"grants.acronym",# The mapping did not work since metadata.funding.award.id yields a field not found.
"imprint.\*",# I think it's not used anymore or relates to a type of resource
"imprint.place",# same as above
"imprint.publisher",# same as above
"isbn",# same as above
"journal.\*",# same as above
"journal.issue",# same as above
"journal.pages",# same as above
"journal.title",# same as above
"journal.volume",# same as above
"journal.year",# same as above
"relations.version.count",# versions is a new link now records/<recid>/versions
"meeting.\*",# not used anymore
"meeting.acronym",# not used anymore
"meeting.dates",# not used anymore
"meeting.place",# not used anymore
"meeting.session_part",# not used anymore
"meeting.session",# not used anymore
"meeting.title",# not used anymore
"meeting.url",# not used anymore
"part_of.\*",# not used anymore
"part_of.pages",# not used anymore
"part_of.title",# not used anymore
"references.\*",# not used anymore
"Resource ",# not used anymore
"notes",# does not exist anymore ?
"subject.term",# I am not sure what's the equivalent mapping. Perhaps subject?
"subject.identifier",# I am not sure what's the equivalent mapping. Perhaps subject?
"owners",# I am not sure what's the equivalent mapping here. Perhaps creators ?
"access_conditions", # does not match to any term in rdm, comments are only allowed for embargoed records
"grants.program", # should be 'metadata.funding.award.program' does not exist
"grants.funder.acronyms", # should be 0metadata.funding.award.acronym' does not exist
]
- [ ] To evaluate if the mechanism needed for the above impacts the current query parsing mechanism in RDM and an enhanced parsing should be implemented.
see comment: https://github.com/zenodo/zenodo-rdm/issues/102#issuecomment-1352886639
"access_conditions", # 'access_conditions' does not exist anymore. It was a description on why the record was restricted. Only availabe for embargoed records.
Agree, the most fitting field would be "embargo reason", but it doesn't match 100%. This has to do with what do we decide about the migration (or not) of the "restricted access" records (i.e. the feature where a user can request access on a closed access record with some conditions).
I would remove it completely and move over to the "failed migrations" so we address later.
"access_right", # maps to "access.status". However access.status does not work (e.g. query by 'access.status:"open"' yields nothing) and the redirection works (e.g. 'access_right:"open"' yields records).
For the migrated records on ZenodoRDM we don't have many different cases of access.status
, but the fields mapping is correct.
What needs some tuning is trranslating the actual search values:
- (Legacy: RDM)
-
open
:open
-
embargoed
:embargoed
-
restricted
:restricted
(this doesn't mean the same though... see the point aboutaccess_condiations
) -
closed
:restricted
👈❗ - N/A:
metadata-only
Probably that's a new functionality for the query-parser though, since it's about changing with the original search values (not just fields/terms).
"grants.program", # "metadata.funding.id": validate whether it's the correct mapping
This incorrect, but this information is also not availbale on the metadata of awards
currently (should be something like metadata.funding.award.program
).
"grants.title", # "metadata.funding.award.title.en" : validate whether it's the correct mapping
Yes, that looks correct.
"grants.funder.acronyms", # "metadata.funding.funder.id" : validate whether it's the correct mapping
This should be metadata.funding.award.acronym
, but the field is missing from the relation configuration and the record mapping.
I'm a bit concerned also that the legacy field is .acronyms
(plural) and in RDM we have .acronym
(singular). Probably it's good to assume an array (since it also doesn't make a difference for the mapping).
"grants.funder.doi", # "metadata.funding.id" : validate whether it's the correct mapping
This one can be metadata.funding.funder.identifiers.identifier
(since the funder DOI is included there).
"license.license", # "metadata.rights.title.en" : validate whether it's the correct mapping
Funnily, there is no such field in the actual legacy API mapping 🙃
So no need to map (we should fix that in the legacy search guide).
See https://github.com/zenodo/zenodo-rdm/issues/102#issuecomment-1352886639 for further information on missing redirections