rtracklayer icon indicating copy to clipboard operation
rtracklayer copied to clipboard

Importing bigWigs from GEO: ` Error in .local(con, format, text, ...) : UCSC library operation failed `

Open bschilder opened this issue 2 years ago • 5 comments

Hello,

rtracklayer has been great for importing various supplementary files from GEO. However, I've run into the following error when trying to import certain bigWig files.

A couple of notes:

  • I've tried importing directly via the URL, or by downloading the file and trying to import it locally (both produce the same error).
  • I see other users has issues on Windows, but I'm using MacOS. #52 #57

Reprex

GEO page. Comes from dataset GSE188512 in a study led by @dbart1807

URL <- "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw" 
query_granges <- GenomicRanges::GRanges("chr6:165169213-167169213")

gr <- rtracklayer::import(con = URL, which = query_granges)
gr <- rtracklayer::import.bw(con = URL, which = query_granges)

Error

 Error in .local(con, format, text, ...) : UCSC library operation failed 

Session info

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] echoannot_0.99.4                  BSgenome.Hsapiens.UCSC.hg38_1.4.4
 [3] BSgenome_1.62.0                   rtracklayer_1.54.0               
 [5] Biostrings_2.62.0                 XVector_0.34.0                   
 [7] GenomicRanges_1.46.1              GenomeInfoDb_1.30.1              
 [9] IRanges_2.28.0                    S4Vectors_0.32.4                 
[11] BiocGenerics_0.40.0              

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3                          GGally_2.1.2                           
  [3] R.methodsS3_1.8.1                       tidyr_1.2.0                            
  [5] ggplot2_3.3.5                           bit64_4.0.5                            
  [7] knitr_1.38                              DelayedArray_0.20.0                    
  [9] R.utils_2.11.0                          data.table_1.14.2                      
 [11] rpart_4.1.16                            KEGGREST_1.34.0                        
 [13] RCurl_1.98-1.6                          GEOquery_2.62.2                        
 [15] AnnotationFilter_1.18.0                 generics_0.1.2                         
 [17] GenomicFeatures_1.46.5                  RSQLite_2.2.11                         
 [19] shadowtext_0.1.1                        proxy_0.4-26                           
 [21] bit_4.0.4                               tzdb_0.3.0                             
 [23] enrichplot_1.14.2                       xml2_1.3.3                             
 [25] lubridate_1.8.0                         SummarizedExperiment_1.24.0            
 [27] assertthat_0.2.1                        viridis_0.6.2                          
 [29] gargle_1.2.0                            xfun_0.30                              
 [31] hms_1.1.1                               fansi_1.0.3                            
 [33] restfulr_0.0.13                         progress_1.2.2                         
 [35] caTools_1.18.2                          dbplyr_2.1.1                           
 [37] Rgraphviz_2.38.0                        igraph_1.3.0                           
 [39] DBI_1.1.2                               htmlwidgets_1.5.4                      
 [41] reshape_0.8.8                           purrr_0.3.4                            
 [43] ellipsis_0.3.2                          dplyr_1.0.8                            
 [45] backports_1.4.1                         biomaRt_2.50.3                         
 [47] MatrixGenerics_1.6.0                    MungeSumstats_1.3.16                   
 [49] vctrs_0.4.0                             Biobase_2.54.0                         
 [51] ensembldb_2.18.4                        cachem_1.0.6                           
 [53] withr_2.5.0                             ggforce_0.3.3                          
 [55] checkmate_2.0.0                         treeio_1.18.1                          
 [57] GenomicAlignments_1.30.0                prettyunits_1.1.1                      
 [59] cluster_2.1.3                           DOSE_3.20.1                            
 [61] ape_5.6-2                               lazyeval_0.2.2                         
 [63] crayon_1.5.1                            crul_1.2.0                             
 [65] pkgconfig_2.0.3                         tweenr_1.0.2                           
 [67] nlme_3.1-157                            pkgload_1.2.4                          
 [69] ProtGenerics_1.26.0                     XGR_1.1.8                              
 [71] nnet_7.3-17                             rlang_1.0.2                            
 [73] lifecycle_1.0.1                         filelock_1.0.2                         
 [75] httpcode_0.3.0                          BiocFileCache_2.2.1                    
 [77] echotabix_0.99.5                        dichromat_2.0-0                        
 [79] rprojroot_2.0.2                         polyclip_1.10-0                        
 [81] matrixStats_0.61.0                      graph_1.72.0                           
 [83] Matrix_1.4-1                            aplot_0.1.3                            
 [85] osfr_0.2.8                              boot_1.3-28                            
 [87] base64enc_0.1-3                         png_0.1-7                              
 [89] viridisLite_0.4.0                       rjson_0.2.21                           
 [91] clisymbols_1.2.0                        rootSolve_1.8.2.3                      
 [93] bitops_1.0-7                            R.oo_1.24.0                            
 [95] KernSmooth_2.23-20                      ggnetwork_0.5.10                       
 [97] blob_1.2.2                              stringr_1.4.0                          
 [99] qvalue_2.26.0                           regioneR_1.26.1                        
[101] dnet_1.1.7                              gridGraphics_0.5-1                     
[103] readr_2.1.2                             jpeg_0.1-9                             
[105] echodata_0.99.7                         scales_1.1.1                           
[107] memoise_2.0.1                           magrittr_2.0.3                         
[109] plyr_1.8.7                              hexbin_1.28.2                          
[111] gplots_3.1.1                            zlibbioc_1.40.0                        
[113] scatterpie_0.1.7                        compiler_4.1.0                         
[115] echoconda_0.99.5                        BiocIO_1.4.0                           
[117] RColorBrewer_1.1-2                      plotrix_3.8-2                          
[119] Rsamtools_2.10.0                        cli_3.2.0                              
[121] patchwork_1.1.1                         htmlTable_2.4.0                        
[123] Formula_1.2-4                           MASS_7.3-56                            
[125] tidyselect_1.1.2                        stringi_1.7.6                          
[127] yaml_2.3.5                              GOSemSim_2.20.0                        
[129] supraHex_1.32.0                         latticeExtra_0.6-29                    
[131] ggrepel_0.9.1                           grid_4.1.0                             
[133] VariantAnnotation_1.40.0                fastmatch_1.1-3                        
[135] tools_4.1.0                             lmom_2.8                               
[137] parallel_4.1.0                          rstudioapi_0.13                        
[139] foreign_0.8-82                          TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[141] piggyback_0.1.1                         gridExtra_2.3                          
[143] gld_2.6.4                               farver_2.1.0                           
[145] ggraph_2.0.5                            digest_0.6.29                          
[147] BiocManager_1.30.16                     Rcpp_1.0.8.3                           
[149] OrganismDbi_1.36.0                      httr_1.4.2                             
[151] AnnotationDbi_1.56.2                    RCircos_1.2.2                          
[153] ggbio_1.42.0                            biovizBase_1.42.0                      
[155] colorspace_2.0-3                        brio_1.1.3                             
[157] XML_3.99-0.9                            fs_1.5.2                               
[159] reticulate_1.24-9000                    splines_4.1.0                          
[161] yulab.utils_0.0.4                       RBGL_1.70.0                            
[163] tidytree_0.3.9                          expm_0.999-6                           
[165] gh_1.3.0                                graphlayouts_0.8.0                     
[167] Exact_3.1                               ggplotify_0.1.0                        
[169] ggtree_3.2.1                            jsonlite_1.8.0                         
[171] tidygraph_1.2.0                         ggfun_0.0.6                            
[173] testthat_3.1.3                          R6_2.5.1                               
[175] Hmisc_4.6-0                             pillar_1.7.0                           
[177] htmltools_0.5.2                         glue_1.6.2                             
[179] fastmap_1.1.0                           DT_0.22                                
[181] BiocParallel_1.28.3                     class_7.3-20                           
[183] ChIPseeker_1.30.3                       fgsea_1.20.0                           
[185] mvtnorm_1.1-3                           utf8_1.2.2                             
[187] lattice_0.20-45                         tibble_3.1.6                           
[189] curl_4.3.2                              DescTools_0.99.44                      
[191] gtools_3.9.2                            zip_2.2.0                              
[193] GO.db_3.14.0                            openxlsx_4.2.5                         
[195] survival_3.3-1                          limma_3.50.1                           
[197] googleAuthR_2.0.0                       desc_1.4.1                             
[199] munsell_0.5.0                           e1071_1.7-9                            
[201] DO.db_2.9                               GenomeInfoDbData_1.2.7                 
[203] reshape2_1.4.4                          gtable_0.3.0  

Many thanks in advance, Brian

bschilder avatar Apr 01 '22 15:04 bschilder

Hi @bschilder,

I tried to replicate it on Linux and I think it should behave similarly on macOS.

> gr <- rtracklayer::import.bw(con = URL, which = query_granges)
#R: TCP non-blocking connect() to ftp.ncbi.nlm.nih.gov timed-out in select() after 10000 milliseconds - Cancelling!: Operation #now in progress
#Error in .local(con, format, text, ...) : UCSC library operation failed
#In addition: Warning message:
#In .local(con, format, text, ...) :
#  Can't get data socket for ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw

Request to the URL is timed out as FTP protocol has a limit is 10000 milliseconds in UCSC kent library upon which rtracklayer relies. Hence the error states the UCSC operation failed.

Solution : It should work if you update the protocol to http from ftp. such as

suppressPackageStartupMessages(library(rtracklayer))
URL <- "http://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw" 
query_granges <- GenomicRanges::GRanges("chr6:165169213-167169213")

gr <- rtracklayer::import(con = URL, which = query_granges)
gr <- rtracklayer::import.bw(con = URL, which = query_granges)

It's surprising to know import is not working locally on macOS. If you could provide logs. It would be helpful.

sanchit-saini avatar Apr 05 '22 11:04 sanchit-saini

Aha, the "http://" prefix did the trick! Never realized you could do that.

Here's the outputs from my original reprex. Apologies for not thinking to include these earlier.

Error in seqinfo(con) : UCSC library operation failed
In addition: Warning messages:
1: In seqinfo(con) :
  TCP non-blocking connect() to ftp.ncbi.nlm.nih.gov timed-out in select() after 10000 milliseconds - Cancelling!
2: In seqinfo(con) :
Screenshot 2022-04-06 at 14 32 44

I'll go ahead and add a conditional to my functions that makes sure all ftp URLs have the "http://" prefix. Would it make sense to add this feature internally to rtracklayer as well?

Thank you so much for the quick reply and solution.

All the best, Brian

bschilder avatar Apr 06 '22 13:04 bschilder

Would it make sense to add this feature internally to rtracklayer as well?

rtracklayer cannot modify or insert the prefix of a URI.

The only way we get information about the protocol is from the prefix of the URI. Hence, the burden of providing the correct prefix is on the user.

Without knowing the correct protocol, we don't know how to communicate with the resource such that we cannot operate on them.

An error occurred in the screenshot because the protocol is not present in the URL.

Hope this helps. Thanks

sanchit-saini avatar Apr 09 '22 16:04 sanchit-saini

In the original example I gave, the ftp:// prefix was included and gave the same error as without it. So I don't think the error my in my most recent example was exclusively due to the omission of the ftp:// prefix (though it may very well have contributed).

However, now (as of April 10th 2022) I'm noticing that including the ftp:// prefix (without replacing it with http://) works when it didn't before. Has something changed with rtracklayer since my original post? Can you think of some reason for the inconsistency?

bschilder avatar Apr 10 '22 09:04 bschilder

Has something changed with rtracklayer since my original post?

Nothing's changed. It is at the same commit. https://git.bioconductor.org/packages/rtracklayer I tried to debug it, So it seems to be working expectedly on the local FTP server. Although no success with the provided FTP URL ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw.

I'm noticing that including the ftp:// prefix (without replacing it with http://) works when it didn't before.

Was it the same FTP URL or some other URL? Can you provide the URL which worked?

Can you think of some reason for the inconsistency?

At this moment, I'm not sure.

sanchit-saini avatar Apr 12 '22 22:04 sanchit-saini