Biostrings
Biostrings copied to clipboard
Consistency of seqinfo getter/setter on DNAStringSet
Hi @hpages and @lawremi,
I was looking to set the Seqinfo on a DNAStringSet (created by importing a FASTA file) but found some inconsistent/incomplete behaviour, especially when rtracklayer was loaded and attached. I'm not actually sure whether DNAStringSet is meant to support Seqinfo, but I'd really like it if it did (and had reliable/consistent behaviour). I think this example should illustrate the issue(s).
Cheers, Pete
suppressPackageStartupMessages(library(Biostrings))
suppressPackageStartupMessages(library(GenomeInfoDb))
suppressPackageStartupMessages(library(GenomicRanges))
y <- DNAStringSet(x = c("s1" = "CAG", "s2" = "GGGGGT"))
seqinfo(y)
#> Seqinfo object with 2 sequences from an unspecified genome; no seqlengths:
#> seqnames seqlengths isCircular genome
#> s1 NA NA <NA>
#> s2 NA NA <NA>
seqnames(y) # Shouldn't this give same result as seqnames(seqinfo(y))?
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'seqnames' for signature '"DNAStringSet"'
seqlevels(y)
#> [1] "s1" "s2"
isCircular(y)
#> s1 s2
#> NA NA
genome(y)
#> s1 s2
#> NA NA
seqlengths(y)
#> s1 s2
#> NA NA
seqinfo(y) <- Seqinfo(
seqnames = as.character(seq_along(y)),
seqlengths = lengths(y) + 100, # Unsure if changing length should be allowed.
isCircular = c(FALSE, TRUE),
genome = rep("fake", length(y)))
seqinfo(y)
#> Seqinfo object with 2 sequences (1 circular) from fake genome:
#> seqnames seqlengths isCircular genome
#> 1 103 FALSE fake
#> 2 106 TRUE fake
seqnames(y)
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'seqnames' for signature '"DNAStringSet"'
seqlevels(y)
#> [1] "1" "2"
isCircular(y)
#> 1 2
#> FALSE TRUE
genome(y)
#> 1 2
#> "fake" "fake"
seqlengths(y) # Unsure if changing length should be allowed.
#> 1 2
#> 103 106
suppressPackageStartupMessages(library(rtracklayer))
# Now get completely different results!
seqinfo(y)
#> Seqinfo object with 2 sequences from an unspecified genome:
#> seqnames seqlengths isCircular genome
#> s1 3 NA <NA>
#> s2 6 NA <NA>
seqnames(y)
#> Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'seqnames' for signature '"DNAStringSet"'
seqlevels(y)
#> [1] "s1" "s2"
isCircular(y)
#> s1 s2
#> NA NA
genome(y)
#> s1 s2
#> NA NA
seqlengths(y)
#> s1 s2
#> 3 6
Created on 2018-11-07 by the reprex package (v0.2.1)
Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#> setting value
#> version R version 3.5.1 (2018-07-02)
#> os macOS Sierra 10.12.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Adelaide
#> date 2018-11-07
#>
#> ─ Packages ──────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.0)
#> backports 1.1.2 2017-12-13 [1] CRAN (R 3.5.0)
#> base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.5.0)
#> Biobase 2.42.0 2018-10-30 [1] Bioconductor
#> BiocGenerics * 0.28.0 2018-10-30 [1] Bioconductor
#> BiocParallel 1.16.0 2018-10-30 [1] Bioconductor
#> Biostrings * 2.50.0 2018-10-30 [1] Bioconductor
#> bitops 1.0-6 2013-08-17 [1] CRAN (R 3.5.0)
#> callr 3.0.0 2018-08-24 [1] CRAN (R 3.5.0)
#> cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
#> debugme 1.1.0 2017-10-22 [1] CRAN (R 3.5.0)
#> DelayedArray 0.8.0 2018-10-30 [1] Bioconductor
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0)
#> devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.1)
#> digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1)
#> evaluate 0.12 2018-10-09 [1] CRAN (R 3.5.1)
#> fs 1.2.6 2018-08-23 [1] CRAN (R 3.5.0)
#> GenomeInfoDb * 1.18.0 2018-10-30 [1] Bioconductor
#> GenomeInfoDbData 1.2.0 2018-10-06 [1] Bioconductor
#> GenomicAlignments 1.18.0 2018-10-30 [1] Bioconductor
#> GenomicRanges * 1.34.0 2018-10-30 [1] Bioconductor
#> glue 1.3.0 2018-07-17 [1] CRAN (R 3.5.0)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.0)
#> IRanges * 2.16.0 2018-10-30 [1] Bioconductor
#> knitr 1.20 2018-02-20 [1] CRAN (R 3.5.0)
#> lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
#> Matrix 1.2-15 2018-11-01 [1] CRAN (R 3.5.0)
#> matrixStats 0.54.0 2018-07-23 [1] CRAN (R 3.5.1)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0)
#> pkgbuild 1.0.2 2018-10-16 [1] CRAN (R 3.5.1)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0)
#> processx 3.2.0 2018-08-16 [1] CRAN (R 3.5.0)
#> ps 1.2.1 2018-11-06 [1] CRAN (R 3.5.1)
#> R6 2.3.0 2018-10-04 [1] CRAN (R 3.5.0)
#> Rcpp 0.12.19 2018-10-01 [1] CRAN (R 3.5.1)
#> RCurl 1.95-4.11 2018-07-15 [1] CRAN (R 3.5.0)
#> remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.0)
#> rlang 0.3.0.1 2018-10-25 [1] CRAN (R 3.5.0)
#> rmarkdown 1.10 2018-06-11 [1] CRAN (R 3.5.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
#> Rsamtools 1.34.0 2018-10-30 [1] Bioconductor
#> rtracklayer * 1.42.0 2018-10-30 [1] Bioconductor
#> S4Vectors * 0.20.0 2018-10-30 [1] Bioconductor
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1)
#> stringi 1.2.4 2018-07-20 [1] CRAN (R 3.5.1)
#> stringr 1.3.1 2018-05-10 [1] CRAN (R 3.5.0)
#> SummarizedExperiment 1.12.0 2018-10-30 [1] Bioconductor
#> testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.0)
#> usethis 1.4.0 2018-08-14 [1] CRAN (R 3.5.0)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
#> XML 3.98-1.16 2018-08-19 [1] CRAN (R 3.5.0)
#> XVector * 0.22.0 2018-10-30 [1] Bioconductor
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1)
#> zlibbioc 1.28.0 2018-10-30 [1] Bioconductor
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library
Cleaning up older bug reports and coming back to this now--sorry for the lack of activity thus far. Some things have changed since this was initially reported, but there are still a few outstanding issues.
-
seqlengths
are correctly inferred and consistent in the current build (all results return3,6
regardless ofrtracklayer
) -
seqnames(x)
still throws an error whenx
is aDNAStringSet
- changes lengths is not allowed on
seqinfo<-
seqnames(x)
should not throw an error for XStringSet
objects, so I'll add that to the list of bugs to resolve. The rest of this should be good to go in the latest release.