bagit-python icon indicating copy to clipboard operation
bagit-python copied to clipboard

fetch.txt validation fails with `file://<PATH>` lines

Open avivace opened this issue 4 years ago • 4 comments
trafficstars

BagIt validation fails with "Malformed URLs" when fetch.txt contains lines such as "file:///etc/hosts" while the BagIt specification mentions that fetch.txt should accept any URI according to RFC3986, which says that file paths when no hostname is specified should be written as e.g. file:///etc/host.

See also: https://en.wikipedia.org/wiki/File_URI_scheme

The problem is at https://github.com/LibraryOfCongress/bagit-python/blob/master/bagit.py#L775, since a parsing error is thrown whatever a URL yields no 'netloc', which is obviosuly set to an empty string when the pointed resource is in the local file system

Regardless, I do think that throwing a Parsing error whatever the actual parsing returned is wrong and should be changed

avivace avatar Nov 11 '21 11:11 avivace

Good catch — if you can send a pull request it should be as simple as adding an exception for the file schema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec.

https://github.com/LibraryOfCongress/bagit-conformance-suite/

acdha avatar Nov 12 '21 14:11 acdha

Good catch — if you can send a pull request it should be as simple as adding an exception for the file schema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec.

https://github.com/LibraryOfCongress/bagit-conformance-suite/

Yes! I can work on a PR next week :D

avivace avatar Nov 12 '21 20:11 avivace

Good catch — if you can send a pull request it should be as simple as adding an exception for the file schema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec. https://github.com/LibraryOfCongress/bagit-conformance-suite/

Yes! I can work on a PR next week :D

Thanks, much appreciated!

acdha avatar Nov 12 '21 23:11 acdha

@acdha Did you have the chance to take a look at the PR?

avivace avatar Dec 22 '21 13:12 avivace

@acdha do we have any news here?

avivace avatar Jul 14 '23 14:07 avivace