mik icon indicating copy to clipboard operation
mik copied to clipboard

Add ability for CONTENTdm Compound writer to produce all datastreams

Open mjordan opened this issue 8 years ago • 1 comments

Currently, the CONTENTdm compound writer only gets the 'DSID' datastream, which as a placeholder from the proof of concept as defined in #111. It needs to be able to get any datastream from CONTENTdm that is defined in the [WRITER] section of the .ini file.

mjordan avatar Apr 08 '16 00:04 mjordan

Looking into this a bit more I am leaning toward having this toolchain only produce

  • the MODS datastream and
  • the OBJ datastream corresponding to the main file associated with the CONTENTdm object
    • or, if configured to do so, get the OBJ (master) from the local filesystem

This is consistent with how the Cdm Single File toolchain works; in fact, the Cdm Compund toolchain essentially wraps the Cdm Single File classes in logic that interprets the compound structure expressed in the source CONTENTdm compound object's .cpd file.

We've taken great advantage in the MIK newspaper and book toolchains of the similarity between Islandora's paged content model and CONTENTdm's handling of paged content such that we are able to export from CONTENTdm all of the datastreams that are used by Islandora's paged content model objects. A newspaper or book page exported from CONTENTdm can produce a complete ingest package that creates a page object in Islandora. This ability, combined with Islandora's "Defer derivative generation during ingest" option, has hugely sped up ingestion of newspapers in particular.

Once we move beyond migrating paged content from CONTENTmd to Islandora the similarities between the CONTENTdm source and the Islandora destination dwindle. Most common Islandora content models store a principle datastream (the OBJ) and one or more derivatives for delivery to end users, whereas CONTENTdm only stores what would be considered derivatives to Islandora. The only two types of content the two platforms share across content types is structured metadata and a thumbnail.

Beyond those two datastreams, asking CONTENTdm for files that correspond to "all" datastreams for a variety of single-file content models becomes complicated. For some Islandora content models, we could only get a subset of datastreams that are ready to ingest into Islandora with "Defer derivative generation during ingest" enabled to create complete objects. If the content model of the object we are assembling for ingestion into Islandora with "Defer derivative generation during ingest" enabled requires more than what we can get from CONTENTdm (which is the main user-facing derivative, the thumbnail, and the MODS, plus the local master that would become the OBJ datastream), we would be ingesting incomplete objects.

This a long-winded way of saying that at this point I'm beginning to doubt whether the added complexity of being to account for the inconsistencies between what we can get from CONTENTdm and what we need for complete Islandora ingest packages of varying content models is worth it. The metadata and one of a) the OBJ from CONTENTdm or b) the OBJ from the local filesystem, that would result in a pair of files for each object we are migrating, seems a lot simpler. I'd be willing to trade faster batch ingests for simplicity of the MIK toolchain.

mjordan avatar Apr 08 '16 05:04 mjordan