rentrez icon indicating copy to clipboard operation
rentrez copied to clipboard

Query building in R

Open dwinter opened this issue 8 years ago • 10 comments

(tagging @Monty9 and @htc502 on this as they've each brought i up recently)

The query syntax used in esearch (and wrapped in entrez_search) is very powerful, but somewhat difficult to type. The basic format includes keywords, fields (denoted by square brackets) and and boolean operators AND, OR and NOT. So you might have

("neoplasms"[MeSH Major Topic] AND Mouse[Title/Abstract]) NOT review[Publication Type]

The NCBI has an "advanced query builder" for each database, but it would be nice be able to generate these queries in an R session.

Right now, we have entrez_db_searchable to list the possible search terms. We could also add a query builder. Either a single function that takes 2 or 3-member arguments:

query_builder( c('neoplasm', 'MeSH'), 
               c('AND', 'mouse', 'TIAB')
)
(neoplasms[MeSH] AND Mouse[TIAB])

.. or taking a leaf ouf of the ggplot2 book and making something like a domain specific language

q <- eq(query='neoplasms', field='MeSH') + eq(query='Mouse', field="TIAB", operator=AND)

Doing this properly will definitely take more time than I have at present, but I'm happy hear opinions about the best way to do it (and to help anyone that wants to try if for themselves)

dwinter avatar Sep 08 '15 02:09 dwinter

Hi david, I was asking the same Q on Stackoverflow and @mrdwab (Ananda Mahto) helped me out with this solution.

x <- c("neoplasm", "Lung", "Clinical Trial", "human", "2000:2015")
y <- c("MeSH", "TIAB", "PTYP", "Species","PDAT")
 noquote(sprintf("(%s)", paste(x, "[", y, "]", sep = "", collapse = ", ")))

Output

(neoplasm[MeSH], Lung[TIAB], Clinical Trial[PTYP], human[Species], 2000:2015[PDAT])

gadepallivs avatar Sep 08 '15 17:09 gadepallivs

Interesting @Monty9 -- can you link to the SO question?

dwinter avatar Sep 08 '15 17:09 dwinter

https://stackoverflow.com/questions/32462726/r-how-to-combine-two-char-vectors-so-that-result-looks-like-char1-char2 My goal is to let the user input his search terms and fields and Then, combine the query to input the query as a parameter to search_entrez. Now, there is no need to pass "field" parameter separately in the search_entrez function.

gadepallivs avatar Sep 08 '15 18:09 gadepallivs

So, one thing to do is separate the square brackets from the concetenating

boxify <- function(x) paste0("[",x,"]")

Then you could do something like this

terms <- c("neoplasm", "mouse", "review")
fields <- c("Mesh", "Orgn", "PTYP")
paste0(terms, boxify(fields), collapse=" AND ")
"neoplasm[Mesh] AND mouse[Orgn] AND review[PTYP]"

Note that's not going to work for easily for nested uses of AND OR and NOT

dwinter avatar Sep 08 '15 19:09 dwinter

Hi @dwinter , which way have you choosen to implement the builder function? the simpler way or the ggplot way? why not open a new branch for this feature? I propose the former one as it is easier and we can use this feature right away without much effort ~_~..eager to have a try on it...

htc502 avatar Sep 18 '15 08:09 htc502

Hi @htc502 -- I don't think either way is very easy :)

The problem is being able to balance the ANDs ORs and NOTs

Definitely won't make it to the next release, but I'm keep to work on it for a future one

dwinter avatar Sep 18 '15 16:09 dwinter

hi,@dwinter, I find that query builder like this is comfortable for me: screen shot 2016-04-14 at 6 02 00 pm I steal this from a paper manager software: papers 3. whenever you type something, it will prompt a box like this, allowing u to modify the attributes of your keyword. I don't know if we can find an alternative in an R terminal environment~_~

htc502 avatar Apr 14 '16 09:04 htc502

I think an advance search formula builder via RStudio Addins would be so much helpful.

sbalci avatar Apr 26 '19 16:04 sbalci

Hi Sbalci,

That would be awesome, but I really don't have the time or skill to make something like this.

I think it would be a cool addition, and would love to work with someone that wanted to dot it, but it's probably not on the horizon just now.

dwinter avatar Apr 28 '19 22:04 dwinter

Dear @dwinter

I have tried to make an RStudio Addins for this purpose. The code requires some editing, I am working on that. If you find it useful, I may make a pull request when it is complete.

  • A gif is here:

PubMedSearch

  • Code is here: https://github.com/sbalci/histopathRaddins/blob/master/R/pubmed_search.R

sbalci avatar Dec 24 '19 15:12 sbalci