rentrez icon indicating copy to clipboard operation
rentrez copied to clipboard

How to download large Clinvar tabular table

Open manburst opened this issue 1 year ago • 2 comments

Dear Sir Now I am working project to search a lot of variant data in Clinvar which is time consuming because it seem that Clinvar limit 20 queries per search. For example I want to get 2 specific variant data so I search like this (2[chr] AND 47168774[chrpos37] AND TTC7A[gene] AND p.Leu32Val[varname] AND c.94C>G[varname]) OR (3[chr] AND 9974774[chrpos37] AND IL17RC[gene] AND p.Leu625Val[varname] AND c.1873C>G[varname]) I will get result like this image you will notice download which can export tabular data (or table file like in the image) Can Rentrez package download the tabular table file and could you please give example of command to fetch the data? And how limit of number of query for search with command? Thanks in advance JK

manburst avatar Jul 20 '22 02:07 manburst

The package maintainer isn't currently available to reply. You may be able to find what you need in the Entrez Utilities documentation. The Clinvar docs do say that Clinvar is available via E-Utilties.

Apologies that I don't have more time to assist myself.

allenbaron avatar Jul 20 '22 13:07 allenbaron

q1 <- "2[chr] AND 47168774[chrpos37] AND TTC7A[gene] AND p.Leu32Val[varname] AND c.94C>G[varname]) OR (3[chr] AND 9974774[chrpos37] AND IL17RC[gene] AND p.Leu625Val[varname] AND c.1873C>G[varname]"

#search for gene or topic of interest search <- entrez_search(db = "clinvar", term = q1, use_history = TRUE) #by adding the retmode = "xml", it will put out 9999 Clinvar variants at maximum; if you have more than 9999, figure out a way to chunk them up.

summary <- entrez_summary(db = "clinvar", web_history = search$web_history, retmode = "xml") summary summary_cv <- extract_from_esummary(summary, c("obj_type", "accession", "accession_version", "title", "variation_set", "trait_set", "supporting_submissions","clinical_significance","record_status", "gene_sort", "chr_sort", "location_sort", "variation_set_name", "variation_set_id", "genes", "protein_change", "fda_recognized_database"))

#file output is a matrix; this code transposes the data (t()) and turns that results into a tibble; from here you can unnest_wider or unnest_longer to pull out the list columns

cv_extract_final <- summary_cv %>% t() %>% as_tibble(rownames = NA) %>% rownames_to_column(var="ID")

vestalgd avatar Oct 16 '23 04:10 vestalgd