bagit-python
bagit-python copied to clipboard
fetch.txt validation fails with `file://<PATH>` lines
BagIt validation fails with "Malformed URLs" when fetch.txt contains lines such as "file:///etc/hosts" while the BagIt specification mentions that fetch.txt should accept any URI according to RFC3986, which says that file paths when no hostname is specified should be written as e.g. file:///etc/host.
See also: https://en.wikipedia.org/wiki/File_URI_scheme
The problem is at https://github.com/LibraryOfCongress/bagit-python/blob/master/bagit.py#L775, since a parsing error is thrown whatever a URL yields no 'netloc', which is obviosuly set to an empty string when the pointed resource is in the local file system
Regardless, I do think that throwing a Parsing error whatever the actual parsing returned is wrong and should be changed
Good catch — if you can send a pull request it should be as simple as adding an exception for the file schema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec.
https://github.com/LibraryOfCongress/bagit-conformance-suite/
Good catch — if you can send a pull request it should be as simple as adding an exception for the
fileschema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec.https://github.com/LibraryOfCongress/bagit-conformance-suite/
Yes! I can work on a PR next week :D
Good catch — if you can send a pull request it should be as simple as adding an exception for the
fileschema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec. https://github.com/LibraryOfCongress/bagit-conformance-suite/Yes! I can work on a PR next week :D
Thanks, much appreciated!
@acdha Did you have the chance to take a look at the PR?
@acdha do we have any news here?