valr
valr copied to clipboard
[bed_intersect] consider warning user when input data is passed multiple times through %>% operator
It's possible for the user to mistakenly pass a dataset twice to bed_intersect() when using the %>% operator. This happens when an intermediate function is called within the bed_intersect call. The data gets passed implicitly to the x
argument, then secondly as a dots (...
) argument. I'm not sure what options we have to detect this on our end, and for some users this might be a feature rather than a bug. The native pipe operator won't allow this, as the data/placeholder can't be passed twice.
library(valr)
library(dplyr)
library(tibble)
x <- tribble(
~chrom, ~start, ~end, ~strand,
"1", 0L, 5L, "+"
)
# data from x interpreted as two inputs
x %>%
bed_intersect(group_by(., strand))
#> # A tibble: 1 × 9
#> chrom start.x end.x strand.x start.y end.y strand.y .source .overlap
#> <chr> <int> <int> <chr> <int> <int> <chr> <chr> <int>
#> 1 1 0 5 + 0 5 + 1 5
# is equivalent to
bed_intersect(x, group_by(x, strand))
#> # A tibble: 1 × 9
#> chrom start.x end.x strand.x start.y end.y strand.y .source .overlap
#> <chr> <int> <int> <chr> <int> <int> <chr> <chr> <int>
#> 1 1 0 5 + 0 5 + 1 5
# this behavior can be confusing if intersecting with another tibble
y <- tribble(
~chrom, ~start, ~end, ~nonsense, ~strand,
"XX", 100L, 500L, "hello!", "-"
)
x %>%
bed_intersect(group_by(., strand), group_by(y, strand))
#> # A tibble: 1 × 10
#> chrom start.x end.x strand.x start.y end.y strand.y nonsense.y .source
#> <chr> <int> <int> <chr> <int> <int> <chr> <chr> <chr>
#> 1 1 0 5 + 0 5 + <NA> 1
#> # ℹ 1 more variable: .overlap <int>
# is equivalent to:
bed_intersect(x, group_by(x, strand), group_by(y, strand))
#> # A tibble: 1 × 10
#> chrom start.x end.x strand.x start.y end.y strand.y nonsense.y .source
#> <chr> <int> <int> <chr> <int> <int> <chr> <chr> <chr>
#> 1 1 0 5 + 0 5 + <NA> 1
#> # ℹ 1 more variable: .overlap <int>
This doesn't happen with the native pipe, as you can't (implicitly or explicitly) pass the data twice.
x |> bed_intersect(group_by(.data = _, strand))
Error in bed_intersect(x, group_by(.data = "_", strand)) :
invalid use of pipe placeholder (<input>:1:0)
Created on 2024-04-03 with reprex v2.1.0