mik
mik copied to clipboard
Issue 464
Github issue: #464
What does this Pull Request do?
Adds a fetcher manipulator that restricts the objects harvested via OAI to ones whose OBJ (or other designated datastream) have one of the specified MIME types.
What's new?
A new class file, src/fetchermanipulators/OaipmhIslandoraByMimetype.php
, and some minor cleanup on testing for an HTTP 200 in src/filegetters/OaipmhIslandoraObj.php
.
How should this be tested?
There are no PHPUnit tests for this fetcher manipulator.
To test, use the attached .ini file.
First, run MIK with the fetcher manipulator configured to only harvest objects with the MIME type image/jpeg
. This will harvest all 73 objects in the collection:
./mik -c issue-464.ini
Commencing MIK.
Filtering 73 records through the OaipmhIslandoraByMimetype fetcher manipulator.
====================================================================================================> 100%
Creating 73 Islandora ingest packages. Please be patient.
====================================================================================================> 100%
Done. Output packages are in /tmp/oaitest_output. Log is at /tmp/oaitest_output/mik.log
Completed in 0.27316334644953 minutes.
Your output directory should contain .xml and .jpeg files for all 73 objects.
Then, uncomment the fetcher manipulator entry in the .ini file with the image/png
MIME type and comment out the other entry. Then rerun MIK, making sure that you delete your output and temp directories first:
./mik -c issue-464.ini
Commencing MIK.
Filtering 73 records through the OaipmhIslandoraByMimetype fetcher manipulator.
====================================================================================================> 100%
Creating 0 Islandora ingest packages. Please be patient.
Done. Output packages are in /tmp/oaitest_output. Log is at /tmp/oaitest_output/mik.log
Completed in 0.10025108655294 minutes.
Your output directory should contain no .xml and .jpeg files, since none of the objects in the harvested collection had the image/png
MIME type.
Additional Notes
Wiki entry for this new manipulator is at https://github.com/MarcusBarnes/mik/wiki/Fetcher-manipulator:-OaipmhIslandoraByMimetype. We should link to this wiki entry in the "Manipulators" section of https://github.com/MarcusBarnes/mik/wiki/Toolchain:-OAI-PMH-for-Islandora-repositories.
Interested parties
@MarcusBarnes @bondjimbond
I'll hopefully get to testing this tomorrow. Looks like a promising feature.
Ran the first leg of the test (ini file unchanged), and got problem records for every object. Retrieved XML but no jpeg.
What's in your mik.log?
And manipulator.log
~[2018-05-03 12:49:02] ErrorException.ERROR: ErrorException {"message":"Undefined index: datastream_ids","code":{"settings":{"CONFIG":{"config_id":"oai-test","last_updated_on":"2017-02-21","last_update_by":"bw"},"SYSTEM":{"date_default_timezone":"America/Vancouver","verify_ca":"0"},"FETCHER":{"class":"Oaipmh","oai_endpoint":"https://nwcc.arcabc.ca/oai2/","set_spec":"nwcc_freda2","metadata_prefix":"oai_dc","temp_directory":"/tmp/oaitest_temp"},"METADATA_PARSER":{"class":"dc\OaiToDc"},"FILE_GETTER":{"class":"OaipmhIslandoraObj","temp_directory":"/tmp/oaitest_temp"},"WRITER":{"class":"Oaipmh","output_directory":"/tmp/oaitest_output","postwritehooks":["/usr/bin/php extras/scripts/postwritehooks/oai_dc_to_mods.php"]},"MANIPULATORS":{"fetchermanipulators":["OaiMissingFileSet"]},"LOGGING":{"path_to_log":"/tmp/oaitest_output/mik.log","path_to_manipulator_log":"/tmp/oaitest_output/manipulator.log"}}},"severity":8,"file":"/Users/Brandon/mik/src/filegetters/OaipmhIslandoraObj.php","line":41} [] [2018-05-03 12:49:02] ErrorException.ERROR: ErrorException {"message":"problem instantiating fileGetterClass","details":"[object] (mik\exceptions\MikErrorException(code: 0): at /Users/Brandon/mik/mik:105)"} [] [2018-05-03 12:49:06] ErrorException.ERROR: ErrorException {"message":"Undefined variable: filtered_file_list","code":{"file_list":["/tmp/oaitest_output/mik.log"],"filetered_file_list":[],"pattern":"/tmp/oaitest_output/*","file_path":"/tmp/oaitest_output/mik.log"},"severity":8,"file":"/Users/Brandon/mik/src/fetchermanipulators/OaiMissingFileSet.php","line":131} [] [2018-05-03 12:51:23] config.INFO: MIK Configuration {"config_id":"oai-test"} [] [2018-05-03 12:51:23] config.INFO: MIK Configuration {"last_updated_on":"2017-02-21"} [] [2018-05-03 12:51:23] config.INFO: MIK Configuration {"last_update_by":"bw"} [] [2018-05-03 12:51:23] Info.INFO: MIK started running May 3, 2018, 5:51 am [] []~
OK, thanks, "Undefined index: datastream_ids"
should make it easy to fix, but I'm wondering why it worked for me. Will take a look this evening.
Sorry, I gave you the wrong log output!
[2018-05-03 20:04:22] ErrorException.ERROR: ErrorException {"message":"problem writing package","record_key":"oai%3Adigital.lib.sfu.ca%3Ahiv_1","details":"[object] (GuzzleHttp\Exception\RequestException(code: 0): No system CA bundle could be found in any of the the common system locations.\nPHP versions earlier than 5.6 are not properly configured to use the system's\nCA bundle by default. In order to verify peer certificates, you will need to\nsupply the path on disk to a certificate bundle to the 'verify' request\noption: http://docs.guzzlephp.org/en/latest/clients.html#verify. If you do not\nneed a specific certificate bundle, then Mozilla provides a commonly used CA\nbundle which can be downloaded here (provided by the maintainer of cURL):\nhttps://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt. Once\nyou have a CA bundle available on disk, you can set the 'openssl.cafile' PHP\nini setting to point to the path to the file, allowing you to omit the 'verify'\nrequest option. See http://curl.haxx.se/docs/sslcerts.html for more\ninformation. at /Users/Brandon/mik/vendor/guzzlehttp/guzzle/src/Exception/RequestException.php:52, RuntimeException(code: 0): No system CA bundle could be found in any of the the common system locations.\nPHP versions earlier than 5.6 are not properly configured to use the system's\nCA bundle by default. In order to verify peer certificates, you will need to\nsupply the path on disk to a certificate bundle to the 'verify' request\noption: http://docs.guzzlephp.org/en/latest/clients.html#verify. If you do not\nneed a specific certificate bundle, then Mozilla provides a commonly used CA\nbundle which can be downloaded here (provided by the maintainer of cURL):\nhttps://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt. Once\nyou have a CA bundle available on disk, you can set the 'openssl.cafile' PHP\nini setting to point to the path to the file, allowing you to omit the 'verify'\nrequest option. See http://curl.haxx.se/docs/sslcerts.html for more\ninformation. at /Users/Brandon/mik/vendor/guzzlehttp/guzzle/src/functions.php:199)"} []
Can you add verify_ca = false
to your .ini file's {SYSTEM]
section and try again?
Set it to false, still problems.
[2018-05-03 20:30:40] ErrorException.ERROR: ErrorException {"message":"problem writing package","record_key":"oai%3Adigital.lib.sfu.ca%3Ahiv_1","details":"[object] (GuzzleHttp\Exception\RequestException(code: 0): No system CA bundle could be found in any of the the common system locations.\nPHP versions earlier than 5.6 are not properly configured to use the system's\nCA bundle by default. In order to verify peer certificates, you will need to\nsupply the path on disk to a certificate bundle to the 'verify' request\noption: http://docs.guzzlephp.org/en/latest/clients.html#verify. If you do not\nneed a specific certificate bundle, then Mozilla provides a commonly used CA\nbundle which can be downloaded here (provided by the maintainer of cURL):\nhttps://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt. Once\nyou have a CA bundle available on disk, you can set the 'openssl.cafile' PHP\nini setting to point to the path to the file, allowing you to omit the 'verify'\nrequest option. See http://curl.haxx.se/docs/sslcerts.html for more\ninformation. at /Users/Brandon/mik/vendor/guzzlehttp/guzzle/src/Exception/RequestException.php:52, RuntimeException(code: 0): No system CA bundle could be found in any of the the common system locations.\nPHP versions earlier than 5.6 are not properly configured to use the system's\nCA bundle by default. In order to verify peer certificates, you will need to\nsupply the path on disk to a certificate bundle to the 'verify' request\noption: http://docs.guzzlephp.org/en/latest/clients.html#verify. If you do not\nneed a specific certificate bundle, then Mozilla provides a commonly used CA\nbundle which can be downloaded here (provided by the maintainer of cURL):\nhttps://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt. Once\nyou have a CA bundle available on disk, you can set the 'openssl.cafile' PHP\nini setting to point to the path to the file, allowing you to omit the 'verify'\nrequest option. See http://curl.haxx.se/docs/sslcerts.html for more\ninformation. at /Users/Brandon/mik/vendor/guzzlehttp/guzzle/src/functions.php:199)"} []
As far as I know, that problem is specific to Macs, so I'm afraid I can't be of much help troubleshooting it. See https://github.com/MarcusBarnes/mik/wiki/Cookbook:-Running-MIK-on-Mac-OS-X, which is based on information at the official Guzzle documentation at http://docs.guzzlephp.org/en/stable/request-options.html#verify. @MarcusBarnes any suggestions?
Another error -- I tried using the regular CSV Single File toolchain with this branch (by accident), got the following:
Fatal error: An iterator cannot be used with foreach by reference in /Users/Brandon/mik/src/fetchers/Csv.php on line 93
On the master branch it's fine.
Branches are out of sync. I'll need to cut a new one from the most recent master. Won't get a chance to do that until after noon my time.
If you got far enough to find that glitch, it sounds like the certificate stuff is no longer a problem. Is that the case?
@mjordan No, it's still the case for OAI toolchain. (Just ran it again to confirm.) Just an additional problem with this branch.