plyranges
plyranges copied to clipboard
[BUG] Character List columns cause an error in join_overlap_left()
Hi Stuart,
Hope things are going well & I'm still finding this to be such a useful package.
I've come across a problem with join_overlap_left()
if the right ranges contain a CharacterList
column, as might be output from reduce_ranges()
depending on the function being used. If there is a CharacterList
column, the fuction simply outputs the error:
Error: subscript contains NAs
As a minimal reproducible example:
library(plyranges)
x <- GRanges(c("chr1:1-10", "chr1:21-30"))
y <- GRanges("chr1:25-30") %>% mutate(letter = CharacterList("a"))
join_overlap_left(x, y)
Error: subscript contains NAs
This produces the above error, however, the same error doesn't occur when using a generic S3 list column
y$letter <- as(y$letter, "list")
join_overlap_left(x, y)
GRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | letter
<Rle> <IRanges> <Rle> | <list>
[1] chr1 1-10 * |
[2] chr1 21-30 * | a
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
I also noticed that I couldn't find a way to change the original CharacterList
column into an S3 list using mutate()
, but that might be a side issue.
y <- GRanges("chr1:25-30") %>% mutate(letter = CharacterList("a"))
mutate(y, letter = as(letter, "list"))
GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | letter
<Rle> <IRanges> <Rle> | <character>
[1] chr1 25-30 * | a
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
This shouldn't (to my mind) produce a character
column, but should return a list
column. If the object is more complicated than my toy example, it can cause the data to fall apart pretty badly.
y <- GRanges(c("chr1:25-30", "chr1:101")) %>%
mutate(letter = CharacterList(list("a", c("b", "c"))))
y %>%
mutate(letter = as(letter, "list"))
GRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | letter
<Rle> <IRanges> <Rle> | <character>
[1] chr1 25-30 * | a
[2] chr1 101 * | a
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
Warning message:
In recycleSingleBracketReplacementValue(value, x, nsbs) :
number of values supplied is not a sub-multiple of the number of values to be replaced
Hopefully that's not too much information.
Cheers,
Steve
R session information
─ Session info ──────────────────────────────────────────────────────────────
setting value
version R version 4.1.0 (2021-05-18)
os Ubuntu 20.04.2 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz Australia/Adelaide
date 2021-07-02
─ Packages ──────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
Biobase 2.52.0 2021-05-19 [1] Bioconductor
BiocGenerics * 0.38.0 2021-05-19 [1] Bioconductor
BiocIO 1.2.0 2021-05-19 [1] Bioconductor
BiocManager 1.30.16 2021-06-15 [1] CRAN (R 4.1.0)
BiocParallel 1.26.0 2021-05-19 [1] Bioconductor
Biostrings 2.60.1 2021-06-06 [1] Bioconductor
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
cli 3.0.0 2021-06-30 [1] CRAN (R 4.1.0)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
DelayedArray 0.18.0 2021-05-19 [1] Bioconductor
digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
GenomeInfoDb * 1.28.0 2021-05-19 [1] Bioconductor
GenomeInfoDbData 1.2.6 2021-06-28 [1] Bioconductor
GenomicAlignments 1.28.0 2021-05-19 [1] Bioconductor
GenomicRanges * 1.44.0 2021-05-19 [1] Bioconductor
glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
httpuv 1.6.1 2021-05-07 [1] CRAN (R 4.1.0)
IRanges * 2.26.0 2021-05-19 [1] Bioconductor
knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0)
later 1.2.0 2021-04-23 [1] CRAN (R 4.1.0)
lattice 0.20-44 2021-05-02 [4] CRAN (R 4.1.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
Matrix 1.3-4 2021-06-01 [4] CRAN (R 4.1.0)
MatrixGenerics 1.4.0 2021-05-19 [1] Bioconductor
matrixStats 0.59.0 2021-06-01 [1] CRAN (R 4.1.0)
pillar 1.6.1 2021-05-16 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
plyranges * 1.12.1 2021-06-29 [1] Bioconductor
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0)
Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.1.0)
RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.1.0)
restfulr 0.0.13 2017-08-06 [1] CRAN (R 4.1.0)
rjson 0.2.20 2018-06-08 [1] CRAN (R 4.1.0)
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
rmarkdown 2.9 2021-06-15 [1] CRAN (R 4.1.0)
Rsamtools 2.8.0 2021-05-19 [1] Bioconductor
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
rtracklayer 1.52.0 2021-05-19 [1] Bioconductor
S4Vectors * 0.30.0 2021-05-19 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
SummarizedExperiment 1.22.0 2021-05-19 [1] Bioconductor
tibble 3.1.2 2021-05-16 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
utf8 1.2.1 2021-03-12 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
workflowr * 1.6.2 2020-04-30 [1] CRAN (R 4.1.0)
xfun 0.24 2021-06-15 [1] CRAN (R 4.1.0)
XML 3.99-0.6 2021-03-16 [1] CRAN (R 4.1.0)
XVector 0.32.0 2021-05-19 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
zlibbioc 1.38.0 2021-05-19 [1] Bioconductor
I should also add that I tracked it down to the following line from .join_overlap_left()
mcols_outer <- na_dframe(mcols(right), sum(only_left))
Might save you a few minutes while debugging
Thanks for the report Steve, I'll try to get to this one on the weekend :)
the same bug happened when the meta col is 'DNAStringSet' , would you mind adding support to this? Thank you ~