rtracklayer
rtracklayer copied to clipboard
Importing bigWigs from GEO: ` Error in .local(con, format, text, ...) : UCSC library operation failed `
Hello,
rtracklayer
has been great for importing various supplementary files from GEO. However, I've run into the following error when trying to import certain bigWig files.
A couple of notes:
- I've tried importing directly via the URL, or by downloading the file and trying to import it locally (both produce the same error).
- I see other users has issues on Windows, but I'm using MacOS. #52 #57
Reprex
GEO page. Comes from dataset GSE188512 in a study led by @dbart1807
URL <- "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw"
query_granges <- GenomicRanges::GRanges("chr6:165169213-167169213")
gr <- rtracklayer::import(con = URL, which = query_granges)
gr <- rtracklayer::import.bw(con = URL, which = query_granges)
Error
Error in .local(con, format, text, ...) : UCSC library operation failed
Session info
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] echoannot_0.99.4 BSgenome.Hsapiens.UCSC.hg38_1.4.4
[3] BSgenome_1.62.0 rtracklayer_1.54.0
[5] Biostrings_2.62.0 XVector_0.34.0
[7] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[9] IRanges_2.28.0 S4Vectors_0.32.4
[11] BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 GGally_2.1.2
[3] R.methodsS3_1.8.1 tidyr_1.2.0
[5] ggplot2_3.3.5 bit64_4.0.5
[7] knitr_1.38 DelayedArray_0.20.0
[9] R.utils_2.11.0 data.table_1.14.2
[11] rpart_4.1.16 KEGGREST_1.34.0
[13] RCurl_1.98-1.6 GEOquery_2.62.2
[15] AnnotationFilter_1.18.0 generics_0.1.2
[17] GenomicFeatures_1.46.5 RSQLite_2.2.11
[19] shadowtext_0.1.1 proxy_0.4-26
[21] bit_4.0.4 tzdb_0.3.0
[23] enrichplot_1.14.2 xml2_1.3.3
[25] lubridate_1.8.0 SummarizedExperiment_1.24.0
[27] assertthat_0.2.1 viridis_0.6.2
[29] gargle_1.2.0 xfun_0.30
[31] hms_1.1.1 fansi_1.0.3
[33] restfulr_0.0.13 progress_1.2.2
[35] caTools_1.18.2 dbplyr_2.1.1
[37] Rgraphviz_2.38.0 igraph_1.3.0
[39] DBI_1.1.2 htmlwidgets_1.5.4
[41] reshape_0.8.8 purrr_0.3.4
[43] ellipsis_0.3.2 dplyr_1.0.8
[45] backports_1.4.1 biomaRt_2.50.3
[47] MatrixGenerics_1.6.0 MungeSumstats_1.3.16
[49] vctrs_0.4.0 Biobase_2.54.0
[51] ensembldb_2.18.4 cachem_1.0.6
[53] withr_2.5.0 ggforce_0.3.3
[55] checkmate_2.0.0 treeio_1.18.1
[57] GenomicAlignments_1.30.0 prettyunits_1.1.1
[59] cluster_2.1.3 DOSE_3.20.1
[61] ape_5.6-2 lazyeval_0.2.2
[63] crayon_1.5.1 crul_1.2.0
[65] pkgconfig_2.0.3 tweenr_1.0.2
[67] nlme_3.1-157 pkgload_1.2.4
[69] ProtGenerics_1.26.0 XGR_1.1.8
[71] nnet_7.3-17 rlang_1.0.2
[73] lifecycle_1.0.1 filelock_1.0.2
[75] httpcode_0.3.0 BiocFileCache_2.2.1
[77] echotabix_0.99.5 dichromat_2.0-0
[79] rprojroot_2.0.2 polyclip_1.10-0
[81] matrixStats_0.61.0 graph_1.72.0
[83] Matrix_1.4-1 aplot_0.1.3
[85] osfr_0.2.8 boot_1.3-28
[87] base64enc_0.1-3 png_0.1-7
[89] viridisLite_0.4.0 rjson_0.2.21
[91] clisymbols_1.2.0 rootSolve_1.8.2.3
[93] bitops_1.0-7 R.oo_1.24.0
[95] KernSmooth_2.23-20 ggnetwork_0.5.10
[97] blob_1.2.2 stringr_1.4.0
[99] qvalue_2.26.0 regioneR_1.26.1
[101] dnet_1.1.7 gridGraphics_0.5-1
[103] readr_2.1.2 jpeg_0.1-9
[105] echodata_0.99.7 scales_1.1.1
[107] memoise_2.0.1 magrittr_2.0.3
[109] plyr_1.8.7 hexbin_1.28.2
[111] gplots_3.1.1 zlibbioc_1.40.0
[113] scatterpie_0.1.7 compiler_4.1.0
[115] echoconda_0.99.5 BiocIO_1.4.0
[117] RColorBrewer_1.1-2 plotrix_3.8-2
[119] Rsamtools_2.10.0 cli_3.2.0
[121] patchwork_1.1.1 htmlTable_2.4.0
[123] Formula_1.2-4 MASS_7.3-56
[125] tidyselect_1.1.2 stringi_1.7.6
[127] yaml_2.3.5 GOSemSim_2.20.0
[129] supraHex_1.32.0 latticeExtra_0.6-29
[131] ggrepel_0.9.1 grid_4.1.0
[133] VariantAnnotation_1.40.0 fastmatch_1.1-3
[135] tools_4.1.0 lmom_2.8
[137] parallel_4.1.0 rstudioapi_0.13
[139] foreign_0.8-82 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[141] piggyback_0.1.1 gridExtra_2.3
[143] gld_2.6.4 farver_2.1.0
[145] ggraph_2.0.5 digest_0.6.29
[147] BiocManager_1.30.16 Rcpp_1.0.8.3
[149] OrganismDbi_1.36.0 httr_1.4.2
[151] AnnotationDbi_1.56.2 RCircos_1.2.2
[153] ggbio_1.42.0 biovizBase_1.42.0
[155] colorspace_2.0-3 brio_1.1.3
[157] XML_3.99-0.9 fs_1.5.2
[159] reticulate_1.24-9000 splines_4.1.0
[161] yulab.utils_0.0.4 RBGL_1.70.0
[163] tidytree_0.3.9 expm_0.999-6
[165] gh_1.3.0 graphlayouts_0.8.0
[167] Exact_3.1 ggplotify_0.1.0
[169] ggtree_3.2.1 jsonlite_1.8.0
[171] tidygraph_1.2.0 ggfun_0.0.6
[173] testthat_3.1.3 R6_2.5.1
[175] Hmisc_4.6-0 pillar_1.7.0
[177] htmltools_0.5.2 glue_1.6.2
[179] fastmap_1.1.0 DT_0.22
[181] BiocParallel_1.28.3 class_7.3-20
[183] ChIPseeker_1.30.3 fgsea_1.20.0
[185] mvtnorm_1.1-3 utf8_1.2.2
[187] lattice_0.20-45 tibble_3.1.6
[189] curl_4.3.2 DescTools_0.99.44
[191] gtools_3.9.2 zip_2.2.0
[193] GO.db_3.14.0 openxlsx_4.2.5
[195] survival_3.3-1 limma_3.50.1
[197] googleAuthR_2.0.0 desc_1.4.1
[199] munsell_0.5.0 e1071_1.7-9
[201] DO.db_2.9 GenomeInfoDbData_1.2.7
[203] reshape2_1.4.4 gtable_0.3.0
Many thanks in advance, Brian
Hi @bschilder,
I tried to replicate it on Linux and I think it should behave similarly on macOS.
> gr <- rtracklayer::import.bw(con = URL, which = query_granges)
#R: TCP non-blocking connect() to ftp.ncbi.nlm.nih.gov timed-out in select() after 10000 milliseconds - Cancelling!: Operation #now in progress
#Error in .local(con, format, text, ...) : UCSC library operation failed
#In addition: Warning message:
#In .local(con, format, text, ...) :
# Can't get data socket for ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw
Request to the URL is timed out as FTP protocol has a limit is 10000 milliseconds in UCSC kent library upon which rtracklayer relies. Hence the error states the UCSC operation failed.
Solution : It should work if you update the protocol to http
from ftp
. such as
suppressPackageStartupMessages(library(rtracklayer))
URL <- "http://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw"
query_granges <- GenomicRanges::GRanges("chr6:165169213-167169213")
gr <- rtracklayer::import(con = URL, which = query_granges)
gr <- rtracklayer::import.bw(con = URL, which = query_granges)
It's surprising to know import
is not working locally on macOS. If you could provide logs. It would be helpful.
Aha, the "http://" prefix did the trick! Never realized you could do that.
Here's the outputs from my original reprex. Apologies for not thinking to include these earlier.
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning messages:
1: In seqinfo(con) :
TCP non-blocking connect() to ftp.ncbi.nlm.nih.gov timed-out in select() after 10000 milliseconds - Cancelling!
2: In seqinfo(con) :
data:image/s3,"s3://crabby-images/8b7d1/8b7d19c3a57bac4081cd2103a9f4a7b12481c394" alt="Screenshot 2022-04-06 at 14 32 44"
I'll go ahead and add a conditional to my functions that makes sure all ftp URLs have the "http://" prefix. Would it make sense to add this feature internally to rtracklayer
as well?
Thank you so much for the quick reply and solution.
All the best, Brian
Would it make sense to add this feature internally to
rtracklayer
as well?
rtracklayer
cannot modify or insert the prefix of a URI.
The only way we get information about the protocol is from the prefix of the URI. Hence, the burden of providing the correct prefix is on the user.
Without knowing the correct protocol, we don't know how to communicate with the resource such that we cannot operate on them.
An error occurred in the screenshot because the protocol is not present in the URL.
Hope this helps. Thanks
In the original example I gave, the ftp:// prefix was included and gave the same error as without it. So I don't think the error my in my most recent example was exclusively due to the omission of the ftp:// prefix (though it may very well have contributed).
However, now (as of April 10th 2022) I'm noticing that including the ftp:// prefix (without replacing it with http://) works when it didn't before. Has something changed with rtracklayer
since my original post? Can you think of some reason for the inconsistency?
Has something changed with
rtracklayer
since my original post?
Nothing's changed. It is at the same commit. https://git.bioconductor.org/packages/rtracklayer
I tried to debug it, So it seems to be working expectedly on the local FTP server. Although no success with the provided FTP URL
ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw
.
I'm noticing that including the ftp:// prefix (without replacing it with http://) works when it didn't before.
Was it the same FTP URL or some other URL? Can you provide the URL which worked?
Can you think of some reason for the inconsistency?
At this moment, I'm not sure.