hoad icon indicating copy to clipboard operation
hoad copied to clipboard

test and document redirected license historical URLs

Open maxheld83 opened this issue 5 years ago • 1 comments

Some license info URLs such as http://olabout.wiley.com/WileyCDA/Section/id-815641.html are now redirected to another page. In this case, the website you're being redirected to is so general, that it can no longer be taken as an indication for an open license.

Ah, linkrot, our old foe.

This raises some questions / todos:

  • [ ] should we test on every commit whether URLs still exist?
  • [ ] should we test on every commit whether URLs redirect?
  • [ ] if either of the above fails, should we somehow record a now-deprecated URL as indicative of an open access license during some interval in the past? (I'm guessing that crossref and other upstream databases wouldn't necessarily update all the license URLs but that these might remain whatever they were on publication).
  • [ ] if yes, we should probably keep an archive of what these URLs looked like at said point in the past.

Apologies if this is already completely covered by some other plan or data source concerning license patterns.

maxheld83 avatar Jun 03 '20 18:06 maxheld83

According to my experience with rules on data availability (Höffler, Jan H. 2017. "Replication and Economics Journal Policies." American Economic Review, 107 (5): 52-55. DOI: 10.1257/aer.p20171032) publishers and their journals change their rules frequently. Among the license URLs we identified there are pages like https://www.cambridge.org/core/terms for which the data of the last update is indicated. It would not be sufficient to screen all the hundreds of licenses already identified and regularly look up which new license URLs are used (up to 58 in just one year in one of the datasets we use) and as noted above check which of these pages still exist or whether and if yes where they redirect and since when. One would also have to look at changes of license information on known pages. The Internet Archive https://web.archive.org/web/20190107085013/https://www.cambridge.org/core/legal-notices/terms can help to identify different versions but going through this for so many licenses is very tedious and not all changes are always stored. On top of that, how do we know which version the publishers actually meant to refer to if they sometimes deposit licence information years after articles are published?

jhoeffler avatar Jun 04 '20 09:06 jhoeffler