biomartr icon indicating copy to clipboard operation
biomartr copied to clipboard

EnsemblBacteria: Error: '/tmp/RtmppaSXUW/EnsemblBacteria.txt' does not exist.

Open ndaniel opened this issue 5 years ago • 4 comments

When running this:

> library(biomartr)
> biomartr::meta.retrieval(kingdom = "EnsemblBacteria", db = "ensemblgenomes", type = "genome")

the following error shows up:

Starting retrieval of information for all species stored in ENSEMBLGENOMES... This needs to be done only once.
Starting meta retrieval of all genome files for kingdom: EnsemblBacteria from database: ensemblgenomes.


Generating folder EnsemblBacteria ...
Skipping already downloaded species: It seems like there are some files in download folder that are neither pre-downloaded species files nor doc_ or md5checksum files.


Starting genome retrieval of 'Chryseobacterium sp. Hurlbut01' from ensemblgenomes ...


Error: '/tmp/RtmppaSXUW/EnsemblBacteria.txt' does not exist.
In addition: Warning messages:
1: In .f(.x[[i]], ...) :
  It seems like there are some files in download folder that are neither pre-downloaded species files nor doc_ or md5checksum files.
2: The FTP link: 'ftp://ftp.ensemblgenomes.org/pub/current/bacteria/species_EnsemblBacteria.txt' is not available. This might be due to an instable internet connection, a firewall issue, or wrong organism name. 

The internet connection works fine and also this works just fine

wget ftp://ftp.ensemblgenomes.org/pub/current/bacteria/species_EnsemblBacteria.txt

More info:

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2 biomartr_0.8.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0           bindr_0.1.1          compiler_3.5.2      
 [4] pillar_1.3.1         XVector_0.22.0       prettyunits_1.0.2   
 [7] bitops_1.0-6         tools_3.5.2          progress_1.2.0      
[10] zlibbioc_1.28.0      biomaRt_2.38.0       digest_0.6.18       
[13] bit_1.1-14           jsonlite_1.6         RSQLite_2.1.1       
[16] memoise_1.1.0        tibble_2.0.1         pkgconfig_2.0.2     
[19] rlang_0.3.1          DBI_1.0.0            curl_3.3            
[22] parallel_3.5.2       dplyr_0.7.8          stringr_1.3.1       
[25] httr_1.4.0           Biostrings_2.50.2    S4Vectors_0.20.1    
[28] IRanges_2.16.0       hms_0.4.2            tidyselect_0.2.5    
[31] stats4_3.5.2         bit64_0.9-7          glue_1.3.0          
[34] Biobase_2.42.0       R6_2.3.0             AnnotationDbi_1.44.0
[37] XML_3.98-1.16        readr_1.3.1          purrr_0.2.5         
[40] blob_1.1.1           magrittr_1.5         BiocGenerics_0.28.0 
[43] assertthat_0.2.0     stringi_1.2.4        RCurl_1.95-4.11     
[46] crayon_1.3.4        

ndaniel avatar Feb 19 '19 12:02 ndaniel

Hi Daniel,

Thank you for your error report.

It seems like the internal file '/tmp/RtmppaSXUW/EnsemblBacteria.txt' cannot be stored on your system. Would you mind downloading the developer version of biomartr from GitHub and run the same function again to see if the error still exists on your system? On my macOS system the function using the developer version runs smoothly.

# install developer version from GitHub
source("http://bioconductor.org/biocLite.R")
biocLite("ropensci/biomartr")

Many thanks, Hajk

HajkD avatar Mar 05 '19 17:03 HajkD

I have this same problem with the latest version of biomartr. This is happening on a shared compute server. The actual error message I receive is:

> meta.retrieval(db="genbank", kingdom="archaea")
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/archaea/assembly_summary.txt'
Something went wrong when trying to access the FTP site 'ftp://ftp.ncbi.nlm.nih.gov/'. Sometimes the internet connection isn't stable and re-running the function might help. Otherwise, could there be an issue with the firewall?. Is the the FTP site 'ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/archaea/assembly_summary.txt' currently available?
Error: '/tmp/RtmpaBf6w5/assembly_summary_archaea_genbank.txt' does not exist.
In addition: Warning message:
In download.file(url, ...) :
  URL 'ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/archaea/assembly_summary.txt': Timeout of 60 seconds was reached
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /lila/home/funnellt/miniconda3/lib/libopenblasp-r0.3.15.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomartr_1.0.2

loaded via a namespace (and not attached):
 [1] KEGGREST_1.32.0        progress_1.2.2         tidyselect_1.1.1
 [4] remotes_2.4.0          purrr_0.3.4            vctrs_0.3.8
 [7] generics_0.1.0         stats4_4.1.0           BiocFileCache_2.0.0
[10] utf8_1.2.1             blob_1.2.1             XML_3.99-0.6
[13] rlang_0.4.11           pillar_1.6.1           glue_1.4.2
[16] DBI_1.1.1              rappdirs_0.3.3         BiocGenerics_0.38.0
[19] bit64_4.0.5            dbplyr_2.1.1           GenomeInfoDbData_1.2.6
[22] lifecycle_1.0.0        stringr_1.4.0          zlibbioc_1.38.0
[25] Biostrings_2.60.1      memoise_2.0.0          Biobase_2.52.0
[28] IRanges_2.26.0         fastmap_1.1.0          biomaRt_2.48.1
[31] GenomeInfoDb_1.28.0    parallel_4.1.0         curl_4.3.2
[34] AnnotationDbi_1.54.1   fansi_0.5.0            Rcpp_1.0.6
[37] filelock_1.0.2         BiocManager_1.30.16    cachem_1.0.5
[40] S4Vectors_0.30.0       XVector_0.32.0         bit_4.0.4
[43] hms_1.1.0              png_0.1-7              digest_0.6.27
[46] stringi_1.6.2          dplyr_1.0.7            tools_4.1.0
[49] bitops_1.0-7           magrittr_2.0.1         RCurl_1.98-1.3
[52] RSQLite_2.2.5          tibble_3.1.2           crayon_1.4.1
[55] pkgconfig_2.0.3        ellipsis_0.3.2         xml2_1.3.2
[58] prettyunits_1.1.1      assertthat_0.2.1       httr_1.4.2
[61] R6_2.5.0               compiler_4.1.0

funnell avatar Jun 28 '21 21:06 funnell

Did you check whether you may have firewall issues not allowing you to access the Genbank FTP servers for your Server environment?

HajkD avatar Jun 29 '21 14:06 HajkD

I'm able to connect to the server using ftp:

❯ ftp ftp.ncbi.nlm.nih.gov                                                                                                                                                 (base)
Trying 130.14.250.10...
Connected to ftp.ncbi.nlm.nih.gov (130.14.250.10).
220-

can you think of anything else I'd need to check? Thank you very much for your time and help!

funnell avatar Jul 01 '21 19:07 funnell

This error occurred from the old URL setup, with the generalized URL fetcher, this will not happen (as long as your firewall is not too strict).

This issue can now be closed.

Roleren avatar Sep 27 '23 10:09 Roleren