fulltext icon indicating copy to clipboard operation
fulltext copied to clipboard

returning more than 1000 DOIs using EuropePMC database

Open banderson10 opened this issue 2 years ago • 2 comments

Hello,

I have two questions related to using ft_search() to return DOIs from the EuropePMC database. The questions are below and there is an example to help assist with my questions.

  1. When I run the first example below that is the example in the fulltext manual, I receive a variable in the 'a' object that contains the 1,000 DOIs, a$europmc$data$doi.

res <- ft_search(query="ecology", from='europmc')

a <- ft_search(query="ecology", from='europmc', limit=1000, euroopts = list(cursorMark = res$europmc$cursorMark))

When I change the search term to my desired search term, ft_search() does not return any DOI values. a1$europmc$data$doi does not exist in the a1 object.

res1 <- ft_search(query="spanish flu", from='europmc')

a1 <- ft_search(query="spanish flu", from='europmc', limit=1000, euroopts = list(cursorMark = res1$europmc$cursorMark))

I need the DOIs because I am searching other databases with ft_search(), and I am using the DOI as the unique identifier to remove duplicates before I fetch the full text xml files.

  1. Obtaining more than 1,000 DOIs from a EuropePMC search.

I have read the #184 post for this package in which the author explains that you have to use a cursor to 'page through' the query results. Using the example in the full text manual, as shown below, the query returns 416,312 hits.

res <- ft_search(query='ecology', from='europmc') res$europmc

You can then use the cursorMark argument to 'page through' the results. The code below will return the first 1,000 hits.

a2 <- ft_search(query='ecology', from='europmc', limit=1000, euroopts = list(cursorMark = res$europmc$cursorMark))

The question is how do you obtain the next 1,000 hits and the next 1,000 hits, and so on.... For example, what if you wanted to obtain all 416,312 DOIs?

Thank you for any advice/suggestions you can provide!

Billie

banderson10 avatar Jul 22 '21 13:07 banderson10

hi @banderson10 i've changed jobs and I haven't been able to find a new maintainer for this pkg yet

sckott avatar Jul 27 '21 15:07 sckott

Hi Scott,

I understand. I will post my question to stackoverflow.com.

Thanks for letting me know.

Billie

Billie Anderson, Ph.D. Assistant Professor of Applied Statistics UMKC Henry W. Bloch School of Management 5110 Cherry Street, Suite 327| Kansas City, MO 64110 (816) 235-2301 | @.***

From: Scott Chamberlain @.> Sent: Tuesday, July 27, 2021 10:58 AM To: ropensci/fulltext @.> Cc: Anderson, Billie @.>; Mention @.> Subject: Re: [ropensci/fulltext] returning more than 1000 DOIs using EuropePMC database (#242)

WARNING: This message has originated from an External Source. This may be a phishing expedition that can result in unauthorized access to our IT System. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.

hi @banderson10https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbanderson10&data=04%7C01%7C%7C6a0875e898e84646d1c208d9511755ed%7Ce3fefdbef7e9401ba51a355e01b05a89%7C0%7C0%7C637629982933520545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=QLkiGFDByq%2BcvCtRXDyGGgw%2Ba5wikYPfOwYRytiqun8%3D&reserved=0 i've changed jobs and I haven't been able to find a new maintainer for this pkg yet

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fropensci%2Ffulltext%2Fissues%2F242%23issuecomment-887633375&data=04%7C01%7C%7C6a0875e898e84646d1c208d9511755ed%7Ce3fefdbef7e9401ba51a355e01b05a89%7C0%7C0%7C637629982933530538%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hJELHli9TVsNA8b9cYaGANU2G%2Ba%2Fm9aUHGCVUk1sKPU%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQ6OO3SQEAWPDAA45NZQS23TZ3JRFANCNFSM5A2CE5GA&data=04%7C01%7C%7C6a0875e898e84646d1c208d9511755ed%7Ce3fefdbef7e9401ba51a355e01b05a89%7C0%7C0%7C637629982933530538%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=K%2BgqyIVkjWMnoHlPj3qB5MR9tmSpmFRHymxuLwetPtY%3D&reserved=0.

banderson10 avatar Jul 27 '21 17:07 banderson10

This repository is about to be archived. If you develop a related package, it might be in scope for https://ropensci.org/software-review/

maelle avatar Sep 09 '22 09:09 maelle