mik
mik copied to clipboard
OAI: If item identifier has special characters, temp metadata filename doesn't match filegetter
I'm doing an OAI migration, and running into problems in src/fetchers/Oaipmh.php
.
The Fetcher assumes that the $identifier
and $record_key
are the same value, but they aren't necessarily.
If the item identifier contains special characters (e.g. oai:thisvancouver.vpl.ca:islandora_1910
), MIK treats it differently in different contexts.
When writing the temporary metadata files: https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php#L80-L82
Resulting filename: oai%3Athisvancouver.vpl.ca%3Aislandora_1910.metadata
.
But the $record_key
that is used everywhere else in the code looks like this: oai_thisvancouver.vpl.ca_islandora_1910
.
So you end up with problems like this:
ErrorException.ERROR: ErrorException {"message":"file_get_contents(/Volumes/Arca/tmp/oaitest_temp/oai_thisvancouver.vpl.ca_islandora_410.metadata): failed to open stream: No such file or directory","code":{"record_key":"oai_thisvancouver.vpl.ca_islandora_1910","raw_metadata_path":"/Volumes/Arca/tmp/oaitest_temp/oai_thisvancouver.vpl.ca_islandora_1910.metadata","dom":"[object] (DOMDocument: {})"},"severity":2,"file":"/Users/brandon/sfuvault/mik/src/filegetters/OaipmhModsXpath.php","line":56} []
Because the filegetter is looking for $record_key.metadata
, while the actual filename is $identifier.metadata
. So it can't actually find the file.
So... how the heck do we fix this?
Trying to find where $record_key is first defined.
I've never liked the fact that the OAI identifiers are so ugly and complex. There is a spec for OAI-PMH identifiers, that defines identifiers using the pattern oai-identifier = scheme ":" namespace-identifier ":" local-identifier
. (Note that "namespace" here is not related to Fedora namespaces, it identifies the source OAI repository.) We could, in all places in the MIK OAI code, strip out everything but the "local identifier" part and use that as both the filename and the record key. That would at least give us less rope to hang ourselves with since the filename/record key would be a lot shorter than it is now.
But there is a problem with this: the OAI identifier spec uses :
to separate the OAI-specific bits out from the local identifier... which in the case of Islandora source repos is the PID, which itself contains a :
.
Maybe a general way to approach this is to modify MIK to strip out everything before and after the local identifier part and then to replace any :
with an underscore. If this is done with a central function, we'd just call that function where ever MIK creates or needs to predict an identifier for an object.
That sounds reasonable to me. Where are you thinking of doing this, and what would the function be?
For a quick and dirty patch, I'm thinking the convert-to-underscore would have to happen here: https://github.com/MarcusBarnes/mik/blob/master/src/fetchers/Oaipmh.php#L80-L82
That might just do the job... What do you think?
OK, I've made a change. In that section:
$identifier = ($rec->header->identifier);
$identifier = json_decode(json_encode($identifier), 1)[0];
$identifier = urlencode(str_replace(':', '_', $identifier));
This seems to work; I'm getting files! Unfortunately, the files are not being written to the directories that are created... Weird.