annotatr icon indicating copy to clipboard operation
annotatr copied to clipboard

Error in build_annotations in mm10 : : Error in download.file (url, destfile, quiet = TRUE)

Open GitR-Bio opened this issue 1 year ago • 6 comments

Hi, I'm having the following error in build_annotations. Any idea will be highly appreciated.

loading from cache
Error in download.file(url, destfile, quiet = TRUE) : 
  cannot open URL 'https://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz'
  

detailed commands are below:

> library(annotatr)

> annots=c("mm10_genes_1to5kb", "mm10_genes_promoters", "mm10_genes_cds", "mm10_genes_5UTRs", "mm10_genes_exons", "mm10_genes_firstexons","mm10_genes_introns","mm10_genes_intronexonboundaries","mm10_genes_exonintronboundaries","mm10_genes_3UTRs", "mm10_genes_intergenic","mm10_enhancers_fantom","mm10_lncrna_gencode" )

> BiocManager::install("TxDb.Mmusculus.UCSC.mm10.knownGene")
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for
details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
Bioconductor version 3.18 (BiocManager 1.30.22), R 4.3.2 (2023-10-31 ucrt)
Installation paths not writeable, unable to update packages
  path: C:/Program Files/R/R-4.3.2/library
  packages:
    cluster, foreign, lattice, MASS, Matrix, mgcv, nlme, rpart
Warning message:
package(s) not installed when version(s) same as or greater than current; use `force = TRUE` to re-install:
  'TxDb.Mmusculus.UCSC.mm10.knownGene' 

> annotations = build_annotations(genome = 'mm10', annotations = annots)
Building enhancers...
snapshotDate(): 2023-10-23
loading from cache
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for
details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for
details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
'select()' returned 1:1 mapping between keys and columns
Building promoters...
Building 1to5kb upstream of TSS...
Building intergenic...
Building cds...
Building 5UTRs...
Building 3UTRs...
Building exons...
Building first exons...
Building introns...
Building intron exon boundaries...
Building exon intron boundaries...
snapshotDate(): 2023-10-23
Building lncRNA transcripts...
loading from cache
Error in download.file(url, destfile, quiet = TRUE) : 
  cannot open URL 'https://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz'
Called from: download.file(url, destfile, quiet = TRUE)
Browse[1]> 

> annotatr_cache$list_env()
character(0)

Thanks in advance.

GitR-Bio avatar Feb 07 '24 08:02 GitR-Bio

Similar problem when attempting to build annotations

 build_annotations(genome = 'mm10', annotations = "mm10_cpgs")

I then get back the error

Building CpG islands...
Error in open.connection(5L, "rb") : HTTP error 403.

Ive tried hg38 and still the same problem. Ive tried multiple machines in different geographic locations and still the same problem.

Any ideas/suggestions please?

mbassalbioinformatics avatar Feb 08 '24 11:02 mbassalbioinformatics

It seems we've reached the end of the road for the URLs being stable and accessible, and this is causing failure on the Bioconductor build machines as well.

I'll be looking for Bioconductor resources that contain these resources to avoid downloading from brittle links.

Unfortunately work responsibilities will prevent me from working on this until next week at the earliest.

rcavalcante avatar Feb 08 '24 15:02 rcavalcante

Ok so it seems in the build_annotation.R file, the following URL

http://hgdownload.cse.ucsc.edu/

needs to be changed to

http://hgdownload2.cse.ucsc.edu/

So, change the lines in question (5 in total if i remember correctly), save, re tar.gz the folder and reinstall the package from the archive. That seems to have worked for me.

mbassalbioinformatics avatar Feb 08 '24 23:02 mbassalbioinformatics

Thank you so much for your valuable responses. Probably the connection has been reestablished in my case. But there appeared to be an warning at the end of the codes.

annotations = build_annotations(genome = 'mm10', annotations = annots)
Building enhancers...
snapshotDate(): 2023-10-23
loading from cache
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for
details.
Replacement repositories:
    CRAN: https://cran.rstudio.com/
'select()' returned 1:1 mapping between keys and columns
Building promoters...
Building 1to5kb upstream of TSS...
Building intergenic...
Building cds...
Building 5UTRs...
Building 3UTRs...
Building exons...
Building first exons...
Building introns...
Building intron exon boundaries...
Building exon intron boundaries...
snapshotDate(): 2023-10-23
Building lncRNA transcripts...
loading from cache
Warning message:
In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 2 out-of-bound ranges located on sequence chr4_JH584295_random. Note that ranges located
  on a sequence whose length is unknown (NA) or on a circular sequence are not considered out-of-bound (use
  seqlengths() and isCircular() to get the lengths and circularity flags of the underlying sequences). You can use
  trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more information.

GitR-Bio avatar Feb 09 '24 06:02 GitR-Bio

Similar problem when attempting to build annotations

 build_annotations(genome = 'mm10', annotations = "mm10_cpgs")

I then get back the error

Building CpG islands...
Error in open.connection(5L, "rb") : HTTP error 403.

Ive tried hg38 and still the same problem. Ive tried multiple machines in different geographic locations and still the same problem.

Any ideas/suggestions please?

I have tried with "hg38" and now probably it is working. Could you please rerun for whether resolved automatically.

annots=c("hg38_cpg_islands","hg38_genes_3UTRs","hg38_genes_intergenic","hg38_genes_exonintronboundaries","hg38_lncrna_gencode") 
annotations = build_annotations(genome = 'hg38', annotations = annots)

select()' returned 1:1 mapping between keys and columns
Building promoters...
Building 1to5kb upstream of TSS...
Building intergenic...
Building 3UTRs...
Building exons...
Building introns...
Building exon intron boundaries...
Building CpG islands...
snapshotDate(): 2023-10-23                                                                                                 
Building lncRNA transcripts...
loading from cache

GitR-Bio avatar Feb 09 '24 06:02 GitR-Bio

Have you changed the url in the source file? If not, then do as i commented before and try again. Make sure you restart your R session once you reinstall the new updated package.

mbassalbioinformatics avatar Feb 09 '24 07:02 mbassalbioinformatics

I didn't have this problem on a fresh install. I think this is actually a transient issue. I was able to build hg19, hg38, and mm10 lncRNA resources.

rcavalcante avatar Jun 04 '24 13:06 rcavalcante

Moreover, I have been able to build the CpG island annotations that were also mentioned in this thread.

rcavalcante avatar Jun 04 '24 13:06 rcavalcante