tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Method to "reverse" a tree sequence

Open jeromekelleher opened this issue 3 years ago • 3 comments

It's useful for algorithm development to be able to "flip" the coordinates of a tree sequence around so that we read the trees and sites in the opposite order.

I propose adding a method like this:

def mirror_coordinates(ts):
    """
    Returns a copy of the specified tree sequence in which all
    coordinates x are transformed into L - x.
    """
    L = ts.sequence_length
    tables = ts.dump_tables()
    left = tables.edges.left
    right = tables.edges.right
    tables.edges.left = L - right
    tables.edges.right = L - left
    tables.sites.position = L - tables.sites.position - 1
    # TODO migrations.
    tables.sort()
    return tables.tree_sequence()

I think this is correct, but I'd have to sit down and write a bunch of tests to be sure. (I'm particularly fuzzy about what happens when sites are are not discrete - I guess the above code has to be wrong then because we'll have negative site positions for x < 1. Probably easiest to just raise an error if not discrete_genome.

Some questions:

  1. Should this be a method of TableCollection or TreeSequence (I guess we could do the usual thing and have an "in place" method that transforms the TableCollection?)
  2. Is the name ok? I think mirror_coordinates is better than something like "reverse" as that could mean a number of things. Maybe "reflect" or something else?

Any thoughts @benjeffery @petrelharp

cc @astheeggeggs

jeromekelleher avatar Feb 03 '22 17:02 jeromekelleher

I agree that this would be an in-place operation on a TableCollection. If it found wider use you can always add a TreeSequence method later that does the operation on a copy.

As for naming how about flip_sequence_coordinates?

benjeffery avatar Feb 04 '22 11:02 benjeffery

I like fiip_sequence_coordinates, but wouldn't reverse_sequence_coordinates be even better? I'm not a big fan of "mirror", as it's not what people would first search for (they'd probably search for "reverse"). I also agree that an in-place method of TableCollection is good. I am tempted to keep it out of the TreeSequence namespace since it's a niche operation that we don't expect most users to do, but maybe this is not something to start trying to do.

petrelharp avatar Feb 09 '22 00:02 petrelharp

Let's go with reverse_sequence_coordinates then just as a TableCollection method. If someone asks for it on the TreeSequence method we can add it.

jeromekelleher avatar Feb 09 '22 08:02 jeromekelleher