IRanges
IRanges copied to clipboard
propagate mcols() through setOps where possible
Just an idea -- where possible, propagate mcols().
For example,
mcols(c(a,b)) = rbind(mcols(a), mcols(b))
(IRanges already does this)
mcols(intersect(a,b))[i, ] = cbind(mcols(a)[j, ], mcols(b)[k, ])
where j is the row in a that gave rise to row i in the intersect and k is the same for b.
mcols(setdiff(a, b))[i, ] = mcols(a)[j, ]
where j is the row in a that gave rise to row i in the setdiff.
I don't think this idea applies to union, since a single output row doesn't necessarily map back to a single row in each input IRanges.
I think similar logic could apply to findOverlaps methods.
Unfortunately this idea doesn't apply to intersect
or setdiff
either because for these operations too a range in the output doesn't necessarily map back to a single range in each input object. This happens for example when some ranges have overlaps within the individual input objects:
> library(IRanges)
> a <- IRanges(c(2, 11), c(10, 15))
> b <- IRanges(c(8, 11), c(12, 20))
> intersect(a, b)
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 8 15 8
This is because the ranges in the input object are conceptually reduced before the intersection is computed i.e. everything happens as if intersect(reduce(a), reduce(b))
was performed.
This idea doesn't apply to findOverlaps
either for the same reason.
Ah, good point. Could/should there be a separate method for NormalRanges that does this propagation? If I'm understanding correctly, the funny business arises only in situations where the inputs aren't normal?
If the maintainers believe this is a good idea, I'll be happy to take a stab at it.
Sounds good to me. Just to clarify, we want to propagate the metadata columns from the left input object (x
) for intersect()
and setdiff()
. So we only need to overwrite the intersect,IntegerRanges,IntegerRanges
and setdiff,IntegerRanges,IntegerRanges
methods with intersect,NormalIRanges,IntegerRanges
and intersect,NormalIRanges,IntegerRanges
methods. Yes this introduces a small dissymmetry in intersect()
but propagating from the first argument is consistent with many other operations (and it's easy enough for the user to do intersect(b, a)
instead of intersect(a, b)
in case b
is the NormalIRanges object). Also these new methods should probably return a NormalIRanges object. Thanks!
You're very welcome -- the IRanges package has been extremely useful to me.
One more thing, and this is just to put as much things as possible on the table, I should mention that reduce()
and a few other inter-range transformations (e.g. disjoin()
and range()
) support the with.revmap
argument for mapping back the output ranges to the input ranges. It sounds like set ops union()
, intersect()
, and setdiff()
could also support this argument to map back the output ranges to the ranges in their first argument. If we had this, it would be trivial to implement the new intersect
and setdiff
methods discussed above. But unfortunately we don't have this :-/ So I think the easiest way for you is to use the heavy findOverlaps()
hammer to compute this mapping. Unless you can think of something else...