IRanges icon indicating copy to clipboard operation
IRanges copied to clipboard

propagate mcols() through setOps where possible

Open rcorty opened this issue 5 years ago • 6 comments

Just an idea -- where possible, propagate mcols().

For example,

mcols(c(a,b)) = rbind(mcols(a), mcols(b))

(IRanges already does this)

mcols(intersect(a,b))[i, ] = cbind(mcols(a)[j, ], mcols(b)[k, ])

where j is the row in a that gave rise to row i in the intersect and k is the same for b.

mcols(setdiff(a, b))[i, ] = mcols(a)[j, ]

where j is the row in a that gave rise to row i in the setdiff.

I don't think this idea applies to union, since a single output row doesn't necessarily map back to a single row in each input IRanges.

I think similar logic could apply to findOverlaps methods.

rcorty avatar Apr 10 '19 05:04 rcorty

Unfortunately this idea doesn't apply to intersect or setdiff either because for these operations too a range in the output doesn't necessarily map back to a single range in each input object. This happens for example when some ranges have overlaps within the individual input objects:

> library(IRanges)
> a <- IRanges(c(2, 11), c(10, 15))
> b <- IRanges(c(8, 11), c(12, 20))
> intersect(a, b)
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         8        15         8

This is because the ranges in the input object are conceptually reduced before the intersection is computed i.e. everything happens as if intersect(reduce(a), reduce(b)) was performed.

This idea doesn't apply to findOverlaps either for the same reason.

hpages avatar Apr 10 '19 16:04 hpages

Ah, good point. Could/should there be a separate method for NormalRanges that does this propagation? If I'm understanding correctly, the funny business arises only in situations where the inputs aren't normal?

rcorty avatar Apr 10 '19 17:04 rcorty

If the maintainers believe this is a good idea, I'll be happy to take a stab at it.

rcorty avatar Apr 10 '19 19:04 rcorty

Sounds good to me. Just to clarify, we want to propagate the metadata columns from the left input object (x) for intersect() and setdiff(). So we only need to overwrite the intersect,IntegerRanges,IntegerRanges and setdiff,IntegerRanges,IntegerRanges methods with intersect,NormalIRanges,IntegerRanges and intersect,NormalIRanges,IntegerRanges methods. Yes this introduces a small dissymmetry in intersect() but propagating from the first argument is consistent with many other operations (and it's easy enough for the user to do intersect(b, a) instead of intersect(a, b) in case b is the NormalIRanges object). Also these new methods should probably return a NormalIRanges object. Thanks!

hpages avatar Apr 10 '19 23:04 hpages

You're very welcome -- the IRanges package has been extremely useful to me.

rcorty avatar Apr 10 '19 23:04 rcorty

One more thing, and this is just to put as much things as possible on the table, I should mention that reduce() and a few other inter-range transformations (e.g. disjoin() and range()) support the with.revmap argument for mapping back the output ranges to the input ranges. It sounds like set ops union(), intersect(), and setdiff() could also support this argument to map back the output ranges to the ranges in their first argument. If we had this, it would be trivial to implement the new intersect and setdiff methods discussed above. But unfortunately we don't have this :-/ So I think the easiest way for you is to use the heavy findOverlaps() hammer to compute this mapping. Unless you can think of something else...

hpages avatar Apr 10 '19 23:04 hpages