rentrez
rentrez copied to clipboard
Query building in R
(tagging @Monty9 and @htc502 on this as they've each brought i up recently)
The query syntax used in esearch (and wrapped in entrez_search
) is very powerful, but somewhat difficult to type. The basic format includes keywords, fields (denoted by square brackets) and and boolean operators AND, OR and NOT. So you might have
("neoplasms"[MeSH Major Topic] AND Mouse[Title/Abstract]) NOT review[Publication Type]
The NCBI has an "advanced query builder" for each database, but it would be nice be able to generate these queries in an R session.
Right now, we have entrez_db_searchable
to list the possible search terms. We could also add a query builder. Either a single function that takes 2 or 3-member arguments:
query_builder( c('neoplasm', 'MeSH'),
c('AND', 'mouse', 'TIAB')
)
(neoplasms[MeSH] AND Mouse[TIAB])
.. or taking a leaf ouf of the ggplot2
book and making something like a domain specific language
q <- eq(query='neoplasms', field='MeSH') + eq(query='Mouse', field="TIAB", operator=AND)
Doing this properly will definitely take more time than I have at present, but I'm happy hear opinions about the best way to do it (and to help anyone that wants to try if for themselves)
Hi david, I was asking the same Q on Stackoverflow and @mrdwab (Ananda Mahto) helped me out with this solution.
x <- c("neoplasm", "Lung", "Clinical Trial", "human", "2000:2015")
y <- c("MeSH", "TIAB", "PTYP", "Species","PDAT")
noquote(sprintf("(%s)", paste(x, "[", y, "]", sep = "", collapse = ", ")))
Output
(neoplasm[MeSH], Lung[TIAB], Clinical Trial[PTYP], human[Species], 2000:2015[PDAT])
Interesting @Monty9 -- can you link to the SO question?
https://stackoverflow.com/questions/32462726/r-how-to-combine-two-char-vectors-so-that-result-looks-like-char1-char2
My goal is to let the user input his search terms
and fields
and Then, combine the query
to input the query
as a parameter to search_entrez
. Now, there is no need to pass "field" parameter separately in the search_entrez
function.
So, one thing to do is separate the square brackets from the concetenating
boxify <- function(x) paste0("[",x,"]")
Then you could do something like this
terms <- c("neoplasm", "mouse", "review")
fields <- c("Mesh", "Orgn", "PTYP")
paste0(terms, boxify(fields), collapse=" AND ")
"neoplasm[Mesh] AND mouse[Orgn] AND review[PTYP]"
Note that's not going to work for easily for nested uses of AND OR and NOT
Hi @dwinter , which way have you choosen to implement the builder function? the simpler way or the ggplot way? why not open a new branch for this feature? I propose the former one as it is easier and we can use this feature right away without much effort ~_~..eager to have a try on it...
Hi @htc502 -- I don't think either way is very easy :)
The problem is being able to balance the AND
s OR
s and NOT
s
Definitely won't make it to the next release, but I'm keep to work on it for a future one
hi,@dwinter, I find that query builder like this is comfortable for me:
I steal this from a paper manager software: papers 3. whenever you type something, it will prompt a box like this, allowing u to modify the attributes of your keyword.
I don't know if we can find an alternative in an R terminal environment~_~
I think an advance search formula builder via RStudio Addins would be so much helpful.
Hi Sbalci,
That would be awesome, but I really don't have the time or skill to make something like this.
I think it would be a cool addition, and would love to work with someone that wanted to dot it, but it's probably not on the horizon just now.
Dear @dwinter
I have tried to make an RStudio Addins for this purpose. The code requires some editing, I am working on that. If you find it useful, I may make a pull request when it is complete.
- A gif is here:
- Code is here: https://github.com/sbalci/histopathRaddins/blob/master/R/pubmed_search.R