dipper
dipper copied to clipboard
add logging and testing for "//" in IRIs
We've recently encountered poorly formed IRIs that resulted from most-likely erroneous ids coming from data providers.
For example: "/S0100-879X2005000100006" was coming as a DOI fragment, which we expand out to http://dx.doi.org//S0100-879X2005000100006, but it doesn't actually resolve (notice the double slashes). a quick google search shows it is an incomplete id, which should actually be: http://dx.doi.org/10.1590/S0100-879X2005000100006
these will not be caught downstream; we should catch and warn about these in the dipper pipeline. this should be part of our testing suite.