retractcheck icon indicating copy to clipboard operation
retractcheck copied to clipboard

find_doi() misses part of the DOI in some cases

Open gorkang opened this issue 3 years ago • 2 comments

Hi there,

Pretty sure this is because the journal is using non-standard DOI's, but wanted to let you know, in case there is a "simple fix" for these.

So, in some instances, find_doi() does not extract the full DOI. For example:

https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19970515)16:9%3C981::AID-SIM510%3E3.0.CO;2-N

retractcheck::find_doi("10.1002/(sici)1097-0258(19970515)16:9<981::aid-sim510>3.0.co;2-n")

[1] "10.1002/(SICI)1097-0258(19970515)16:9"

Some other cases:

  • https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19980430)17:8%3C946::AID-SIM2823%3E3.0.CO;2-3
  • https://onlinelibrary.wiley.com/doi/10.1002/%28SICI%291097-0258%2819960715%2915%3A13%3C1377%3A%3AAID-SIM275%3E3.0.CO%3B2-%23

Thanks for the awesome package.

gorkang avatar Apr 08 '22 10:04 gorkang

Hey @gorkang :wave: Thanks for taking the time to report this issue and welcome to the repo!

These kind of DOIs are horrible - I am glad they standardized them at some point. There's a bit more information on it in this CrossRef blog from 2015.

We could add the handling of fallback regular expressions to increase coverage, specifically in this order (taken from the CrossRef blog)

  1. /^10.1002/[^\s]+$/i
  2. /^10.\d{4}/\d+-\d+X?(\d+)\d+<[\d\w]+:[\d\w]*>\d+.\d+.\w+;\d$/i
  3. /^10.1021/\w\w\d++$/i
  4. /^10.1207/[\w\d]+\&\d+_\d+$/i

Do you have an urgent need for this feature?

I know we're still looking for some funds to upgrade the OpenRetractions database behind this, so this isn't high priority on my end right now (funding remains an issue with these things...). Had a few attempts but nothing stuck just yet.

chartgerink avatar Apr 08 '22 10:04 chartgerink

Thanks @chartgerink for the prompt response!

Yes, the DOIs are as horrible as they come... thanks for the CrossRef blog post.

No urgent need at all. I am using find_doi() in a package to help automatically download all the references from papers ({downloadReferences}) and stumbled upon these weird DOI's.

Good luck finding funding.

gorkang avatar Apr 08 '22 11:04 gorkang