mik
mik copied to clipboard
Move 'include_migrated_from_uri' config option from metadata parser class to a metadata manipulator
include_migrated_from_uri setting in CONTENTdm to MODS metadata parser uses a hard-coded hostname, and the form of the MODS element created is specific to SFU's requirements. We have a pattern already for adding identifiers via metadata manipulators in the AddUuidToMods. We should create a metadata manuiplator that adds the migrated from URL.
@MarcusBarnes I can probably find time to take this on, over the long weekend at the latest.
@mjordan Great - that would be helpful. Thank you.
@MarcusBarnes can you test branch issue-176? You will need to run composer dump-autoload
as I have added a new class file. If you run it on newspapers, please make sure that both the issue and page level MODS contain the migrated from URL.
Wiki page for this new manipulator is at https://github.com/MarcusBarnes/mik/wiki/Metadata-manipulator:-AddCdmMigratedFromUrl.
@mjordan Awesome - I will test.
@mjordan My test on a newspaper has the migrated from URL appearing in the issue level MODS, but not the page level MODS.
OK, thanks for testing, I'll take a look.
@mjordan Since the page level MODS is not generated using the mapping, we'll have to conditionally check for the specific manipulator and add some code like that for adding the UUID here:
https://github.com/MarcusBarnes/mik/blob/master/src/metadataparsers/mods/CdmToMods.php#L246
OK, I hadn't notice that. Thanks for pointing it out.
Perhaps we can use the InsertXmlFromTemplate to add this, using a template like:
<identifier type='uri' displayLabel='Migrated from'>http://example.com/item/{{ somefield }}</identifier>
We could document this in a cookbook entry.
@mjordan Thanks. Are you able to test your suggested approach?
Yup, just tested and works as expected. Should I document how to use the InsertXmlFromTemplate manipulator for this, and then remove the hard-wired code in the Cdm* classes?
@mjordan Yes please. This is a better approach.
Will do. I only have to modify the CdmToMods.php metadata parser. If you don't mind, I'll delete the issue-176 branch and cut a new one to do this work in.
@mjordan That's fine. Thank you.
We still have the problem of applying manipulators to page-level metadata. I'll see if I can address that as well in this branch.
@jonathangreen and @MarcusBarnes, looking at this, I'd like to recommend that we create a new metadatamanipulator that uses a Twig template to add the XML fragment to the MODS in the CDM toolchains for the "migrated from" value. The template can take two values, the collection alias and the object or page pointer, both of which are available in the function where the respective MODS is generated. The same template could be used for both the parent and page level fragments. A typical template would look like this:
<identifier type="uri" invalid="yes" displayLabel="Migrated From">http://contentdm.org.net/cdm/ref/collection/{{ collection_alias }}/id/{{ pointer }}</identifier>
This is not as generalized as the InsertXmlFromTemplate manipulator, but I don't think that is suitable for page-level MODS since within createPageLevelModsXML()
only $page_pointer
and $page_title
are available. We already check specifically for the AddUuidToMods manipulator so checking for one more isn't the worst thing we can do. I'd personally like to see more flexibility here for generating custom page-level MODS but without good use cases I'd rather not build that in. If we had some good use cases, however, we should consider doing that.
I'll note that at the time we wrote the CdmToMods metadata parser, we hadn't introduced Twig into MIK. Now that it's there, adding another Twig-based manipulator isn't that hard.
In the .ini file, we'd deprecate include_migrated_from_uri
and instead register the new metadata manipulator like any other, in the [MANIPULATORS]
section like
metadatamanipulators[] = "AddContentdmSourceUrl|null4|/tmp/twigtemplates/migrated_from.xml"
Thoughts?
Seems very reasonable to me!
I'm not sure how many people use the 'include_migrated_from_uri', but just in case, we should tag a point release before merging the result of work on this. That way, if they need to, they can get a copy of the tagged release so that people who are currently using that in a workflow can still use their current workflow until they have the chance to modify their workflows to use the new suggested metadatamanipulator.
Sounds good. @MarcusBarnes if you want to cut a point release, I can work on this.
I've drafted a point release (visible to MIK maintainers) at https://github.com/MarcusBarnes/mik/releases/tag/untagged-50ba6f22dabb99534a02. Once we have a pull-request that addresses this issue, we can publish the point v0.9.1 point release and merge the reviewed pull-request.
Making progress on this, but the new template XML is overwriting other <identifier>
elements. Will continue.
This work is making me remember that the CdmBooks toolchain is probably the least polished of our toolchains. I'm going to do a wee bit of cleanup while I'm in there if that's OK.
Thank you for the update and thank you in advance for your work on this.
Update: the new metadatamanipluator works, but it interferes with other metadatamanipulators. Now all I have to do is figure out why it's doing that and make it stop :face_with_head_bandage: