tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Tree method to return cross coalescence nodes

Open jeromekelleher opened this issue 4 years ago • 3 comments
trafficstars

Following up on #1596, we'd like to build out some basic functionality to help us work with cross coalescence statistics. The simplest thing we can do to start with (as a building block we can use later either directly in the statistics or as part of our tests) is to get the actual cross coalescence nodes. Something like

def cross_coalescence_nodes(self, A, B):
    """
    Returns the nodes that are the unique MRCAs of the nodes in the specified sets of nodes.
    """

Is that it? Getting the precise definition here and thinking about what the edge cases are, will be helpful in building out the rest of the cross coalescence tools.

@stsmall, how does this sound?

jeromekelleher avatar Aug 16 '21 15:08 jeromekelleher

(Maybe that's a bad name for the function too - maybe something more directly descriptive?)

jeromekelleher avatar Aug 16 '21 15:08 jeromekelleher

Alternatively, a name could be something like mrca_list or group_mrcas?

petrelharp avatar Aug 16 '21 17:08 petrelharp

Hi All, Sorry for the lag. Here is a first draft proposal of a cross-coalescent function. I suppose my first thought is that it should take a tree rather than a tree_seq. Does "tracked_samples" apply to individual trees? An array is also prob a better return rather than the current list.

Here is my dirt-level description: Returns the 1 ... Nth mrca between two sets of nodes. In application, this could be interpreted as the Nth cross-coalescent event between two populations. The function returns the 1 ... Nth node for each tree in a tree sequence object. All events are returned since the function ascends from the tips to the Nth event. Parent nodes can be the mrca for >1 cross-coalescent events. For a simple case, imagine a highly structured geneaology between two populations, where the first cross-coalescent event is also the mrca between the two populations. In this case, the returned node contains all 1 ... N, cross-coalescent events, so the array will contain only a single node ID.

def calc_cross_coalescent(tree_seq, nodes_ls, ccN_events=10):
    """Calculate the cross coalescent of two lists of nodes.
    Parameters
    ----------
    tree_seq : Object
        object of type tskit tree seqeunce.
    nodes_ls : List
        List of node ids as integers, [[0, 1, 2],[4, 5, 6]]
    cc_N_events : int, optional
        the number of cross coalescent events to track. The default is 10.
    Returns
    -------
    ccN_ts : List
        the cross coalescent of the population from 1 ... N
    """
    ccN_ts = []
    pop_1 = nodes_ls[0]
    pop_2 = nodes_ls[1]
    iter1 = tree_seq.trees(tracked_samples=pop_1, sample_lists=True)
    iter2 = tree_seq.trees(tracked_samples=pop_2, sample_lists=True)
    p1_samples = set(pop_1)
    p2_samples = set(pop_2)
    for tree1, tree2 in zip(iter1, iter2):
        ccN_tree = []
        num_cc = 0
        used_nodes = set()
        for u in tree1.nodes(order='timeasc'):
            num_pop1 = tree1.num_tracked_samples(u)
            num_pop2 = tree2.num_tracked_samples(u)
            if num_cc < ccN_events:
                if num_pop1 > 0 and num_pop2 > 0:
                    proposed_cc1 = set(tree2.samples(u)) - p2_samples - used_nodes
                    proposed_cc2 = set(tree1.samples(u)) - p1_samples - used_nodes
                    if proposed_cc1 and proposed_cc2:
                        used_nodes |= proposed_cc1 | proposed_cc2
                        simul_cc_events = min(
                            [len(proposed_cc1), len(proposed_cc2)])
                        num_cc += simul_cc_events
                        ccN_tree.extend([u] * simul_cc_events)  # some nodes are the parent of >1 cc
        if num_cc < ccN_events:
            # padding so all trees return the same length
            ccN_tree.extend(np.repeat(np.nan, (ccN_events - num_cc)))
        ccN_ts.append(ccN_tree[:ccN_events])
    return ccN_ts

stsmall avatar Dec 17 '21 14:12 stsmall