tskit Document visual ways of summarising subtrees (clades) when plotting

trafficstars

As part of https://github.com/tskit-dev/tutorials/issues/182 we should think of alternative ways of showing big trees. One possibility is to collapse clades in a tree if e.g. all the nodes underneath belong to the same population (or have no population). I think this is more of a viz issue than a tree-sequence manipulation issue, as it would be done per tree, and we would want to visually distinguish the collapsed clades somehow (perhaps we could use a larger triangle: I don't know if we would want to vary the triangle size depending on the number of samples in the clade).

We could allow this to happen even if e.g. a small proportion of the samples are in a different population. But then that gets pretty complicated. A more sophisticated thing would be to replace the circular internal nodes with a pie chart of the proportions of sample tips underneath the node. I guess in the viz you could specify which nodes you wanted to do this for.

Jun 30 '22 10:06 hyanwong

@savitakartik would be interested in this.

Jun 30 '22 11:06 jeromekelleher

One thing we could do without requiring population semantics is to have a cutoff on the number of leaves that we draw. I'd find this super helpful for the SARS-COV-2 trees, where you'd like to look at the deep structure of the tree. It would really help if we could always draw something quickly.

So, basically if you hit an internal node that has > sample_threshold samples below it, we draw a big box which says (X samples) and stop traversing downwards at that point.

Jun 30 '22 11:06 jeromekelleher

Ih yes, that's a nice idea. Another possibility if the tree is dated is to cut off a section at the bottom such that only X lineages are in the resulting tree (and summarise the tips somehow).

Lots of these operations are probably per-tree, rather than on the entire ts, by the way.

Jun 30 '22 11:06 hyanwong

Here's a quick hack where we limit drawing by depth:

import numpy as np
import tskit
import msprime

ts = msprime.sim_ancestry(10, random_seed=1)
print(ts.first().draw_text())

def chop_draw(tree, max_depth):

    ts = tree.tree_sequence
    tables = ts.tables.copy()
    tables.edges.clear()
    tables.nodes.flags = np.zeros_like(tables.nodes.flags)

    stack = [(root, 0) for root in tree.roots]
    node_labels = {}
    while len(stack) > 0:
        u, depth = stack.pop()
        node = ts.node(u)
        node_labels[u] = f"{u}"
        # print(u, depth)
        if depth < max_depth:
            for v in tree.children(u):
                stack.append((v, depth + 1))
        else:
            node_labels[u] = f"{tree.num_samples(u)} samples"
            node = node.replace(flags=1)
        tables.nodes[u] = node
        parent = tree.parent(u)
        if parent != -1:
            tables.edges.add_row(0, tree.span, parent, u)
    tables.sort()
    ts = tables.tree_sequence()
    ctree = ts.at(tree.interval.left)
    print(ctree.draw_text(node_labels=node_labels))


chop_draw(ts.first(), 4)

gives

                                   38                                                                                                                                                                                                         
                           ┏━━━━━━━━┻━━━━━━━┓                                                                                                                                                                                                 
                           ┃               37                                                                                                                                                                                                 
                           ┃              ┏━┻━┓                                                                                                                                                                                               
                           ┃              ┃  36                                                                                                                                                                                               
                           ┃              ┃  ┏┻━┓                                                                                                                                                                                             
                          35              ┃  ┃  ┃
                  ┏━━━━━━━━┻━━━━━━━━┓     ┃  ┃  ┃
                 34                 ┃     ┃  ┃  ┃
         ┏━━━━━━━━┻━━━━━━━━━┓       ┃     ┃  ┃  ┃
        33                  ┃       ┃     ┃  ┃  ┃
   ┏━━━━━┻━━━━━┓            ┃       ┃     ┃  ┃  ┃
   ┃           ┃           32       ┃     ┃  ┃  ┃
   ┃           ┃        ┏━━━┻━━━┓   ┃     ┃  ┃  ┃
   ┃           ┃        ┃       ┃  31     ┃  ┃  ┃
   ┃           ┃        ┃       ┃  ┏┻━┓   ┃  ┃  ┃
   ┃          30        ┃       ┃  ┃  ┃   ┃  ┃  ┃
   ┃        ┏━━┻━━┓     ┃       ┃  ┃  ┃   ┃  ┃  ┃
   ┃        ┃     ┃    29       ┃  ┃  ┃   ┃  ┃  ┃
   ┃        ┃     ┃   ┏━┻━━┓    ┃  ┃  ┃   ┃  ┃  ┃
   ┃        ┃     ┃  28    ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃        ┃     ┃ ┏━┻┓   ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃        ┃     ┃ ┃ 27   ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃        ┃     ┃ ┃ ┏┻┓  ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃       26     ┃ ┃ ┃ ┃  ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃      ┏━┻━━┓  ┃ ┃ ┃ ┃  ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃     25    ┃  ┃ ┃ ┃ ┃  ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃   ┏━━┻━┓  ┃  ┃ ┃ ┃ ┃  ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃   ┃   24  ┃  ┃ ┃ ┃ ┃  ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃   ┃  ┏━┻┓ ┃  ┃ ┃ ┃ ┃  ┃    ┃  ┃  ┃   ┃  ┃  ┃
   ┃   ┃  ┃  ┃ ┃  ┃ ┃ ┃ ┃  ┃   23  ┃  ┃   ┃  ┃  ┃
   ┃   ┃  ┃  ┃ ┃  ┃ ┃ ┃ ┃  ┃  ┏━┻┓ ┃  ┃   ┃  ┃  ┃
   ┃   ┃  ┃  ┃ ┃  ┃ ┃ ┃ ┃  ┃  ┃  ┃ ┃  ┃  22  ┃  ┃
   ┃   ┃  ┃  ┃ ┃  ┃ ┃ ┃ ┃  ┃  ┃  ┃ ┃  ┃ ┏━┻┓ ┃  ┃
  21   ┃  ┃  ┃ ┃  ┃ ┃ ┃ ┃  ┃  ┃  ┃ ┃  ┃ ┃  ┃ ┃  ┃
 ┏━┻━┓ ┃  ┃  ┃ ┃  ┃ ┃ ┃ ┃  ┃  ┃  ┃ ┃  ┃ ┃  ┃ ┃  ┃
20   ┃ ┃  ┃  ┃ ┃  ┃ ┃ ┃ ┃  ┃  ┃  ┃ ┃  ┃ ┃  ┃ ┃  ┃
┏┻┓  ┃ ┃  ┃  ┃ ┃  ┃ ┃ ┃ ┃  ┃  ┃  ┃ ┃  ┃ ┃  ┃ ┃  ┃
0 6 17 2 12 13 5 14 3 7 9 19 11 18 8 10 1 16 4 15


             38                                       
    ┏━━━━━━━━━┻━━━━━━━━┓                              
   37                  ┃                              
  ┏━┻━┓                ┃                              
  ┃  36                ┃                              
  ┃  ┏┻━┓              ┃                              
  ┃  ┃  ┃             35                              
  ┃  ┃  ┃   ┏━━━━━━━━━━┻━━━━━━━━━━┓                   
  ┃  ┃  ┃   ┃                    34                   
  ┃  ┃  ┃   ┃           ┏━━━━━━━━━┻━━━━━━━━━┓         
  ┃  ┃  ┃   ┃          33                   ┃         
  ┃  ┃  ┃   ┃      ┏━━━━┻━━━━┓              ┃         
  ┃  ┃  ┃   ┃      ┃         ┃             32         
  ┃  ┃  ┃   ┃      ┃         ┃         ┏━━━━┻━━━━┓    
  ┃  ┃  ┃  31      ┃         ┃         ┃         ┃    
  ┃  ┃  ┃ ┏━┻┓     ┃         ┃         ┃         ┃    
  ┃  ┃  ┃ ┃  ┃     ┃     5 samples     ┃         ┃    
  ┃  ┃  ┃ ┃  ┃     ┃                   ┃         ┃    
  ┃  ┃  ┃ ┃  ┃     ┃                   ┃     4 samples
  ┃  ┃  ┃ ┃  ┃     ┃                   ┃              
  ┃  ┃  ┃ ┃  ┃     ┃               2 samples          
  ┃  ┃  ┃ ┃  ┃     ┃                                  
 22  ┃  ┃ ┃  ┃     ┃                                  
┏━┻┓ ┃  ┃ ┃  ┃     ┃                                  
┃  ┃ ┃  ┃ ┃  ┃ 3 samples                              
┃  ┃ ┃  ┃ ┃  ┃                                        
1 16 4 15 8 10

Jun 30 '22 15:06 jeromekelleher

Really nice! And building on that, here's another way, which restricts the total number of lineages instead. Not tested much though,

import numpy as np
import tskit
import msprime

ts = msprime.sim_ancestry(10, random_seed=1)
print(ts.first().draw_text())

def chop_draw2(tree, max_lineages):

    ts = tree.tree_sequence
    tables = ts.tables.copy()
    tables.edges.clear()
    tables.nodes.flags = np.zeros_like(tables.nodes.flags)
    node_labels = {}
    tips = set(tree.roots)

    for n in tree.nodes(order="timedesc"):
        if tree.num_children(n) + len(tips) > max_lineages:
            break
        children = tree.children(n)
        if len(children) > 0:
            tips.remove(n)
            for c in children:
                tips.add(c)
                tables.edges.add_row(0, tree.span, n, c)
    for u in tips:
        node_labels[u] = str(u) if tree.is_leaf(u) else f"{tree.num_samples(u)} samples"
        node = ts.node(u).replace(flags=1)
        tables.nodes[u] = node
    tables.sort()
    ts = tables.tree_sequence()
    ctree = ts.at(tree.interval.left)
    print(ctree.draw_text(node_labels=node_labels))

chop_draw2(ts.first(), 10)

                                  38             
                          ┏━━━━━━━━┻━━━━━━━━┓    
                          ┃                37    
                          ┃               ┏━┻━┓  
                          ┃               ┃  36  
                          ┃               ┃  ┏┻━┓
                         35               ┃  ┃  ┃
                ┏━━━━━━━━━┻━━━━━━━━━┓     ┃  ┃  ┃
               34                   ┃     ┃  ┃  ┃
      ┏━━━━━━━━━┻━━━━━━━━━┓         ┃     ┃  ┃  ┃
      ┃                  33         ┃     ┃  ┃  ┃
      ┃               ┏━━━┻━━━┓     ┃     ┃  ┃  ┃
     32               ┃       ┃     ┃     ┃  ┃  ┃
  ┏━━━┻━━━┓           ┃       ┃     ┃     ┃  ┃  ┃
  ┃       ┃           ┃       ┃    31     ┃  ┃  ┃
  ┃       ┃           ┃       ┃    ┏┻━┓   ┃  ┃  ┃
  ┃       ┃          30       ┃    ┃  ┃   ┃  ┃  ┃
  ┃       ┃        ┏━━┻━━┓    ┃    ┃  ┃   ┃  ┃  ┃
 29       ┃        ┃     ┃    ┃    ┃  ┃   ┃  ┃  ┃
┏━┻━┓     ┃        ┃     ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃  28     ┃        ┃     ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┏━┻┓    ┃        ┃     ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ 27    ┃        ┃     ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┏┻┓   ┃        ┃     ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃   ┃       26     ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃   ┃     ┏━━┻━━┓  ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃   ┃    25     ┃  ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃   ┃   ┏━┻━┓   ┃  ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃   ┃   ┃  24   ┃  ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃   ┃   ┃  ┏┻━┓ ┃  ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃  23   ┃  ┃  ┃ ┃  ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃  ┏┻━┓ ┃  ┃  ┃ ┃  ┃    ┃    ┃  ┃   ┃  ┃  ┃
┃ ┃ ┃ ┃  ┃  ┃ ┃  ┃  ┃ ┃  ┃    ┃    ┃  ┃  22  ┃  ┃
┃ ┃ ┃ ┃  ┃  ┃ ┃  ┃  ┃ ┃  ┃    ┃    ┃  ┃ ┏━┻┓ ┃  ┃
┃ ┃ ┃ ┃  ┃  ┃ ┃  ┃  ┃ ┃  ┃   21    ┃  ┃ ┃  ┃ ┃  ┃
┃ ┃ ┃ ┃  ┃  ┃ ┃  ┃  ┃ ┃  ┃  ┏━┻━━┓ ┃  ┃ ┃  ┃ ┃  ┃
┃ ┃ ┃ ┃  ┃  ┃ ┃  ┃  ┃ ┃  ┃ 20    ┃ ┃  ┃ ┃  ┃ ┃  ┃
┃ ┃ ┃ ┃  ┃  ┃ ┃  ┃  ┃ ┃  ┃ ┏┻━┓  ┃ ┃  ┃ ┃  ┃ ┃  ┃
0 3 7 9 11 18 2 12 13 5 14 6 19 17 8 10 1 16 4 15

                 ┃                                         
      ┏━━━━━━━━━━┻━━━━━━━━━━┓                              
      ┃                     ┃                              
  ┏━━━┻━━┓                  ┃                              
  ┃      ┃                  ┃                              
┏━┻┓     ┃                  ┃                              
┃  ┃     ┃                  ┃                              
┃  ┃     ┃      ┏━━━━━━━━━━━┻━━━━━━━━━━┓                   
┃  ┃     ┃      ┃                      ┃                   
┃  ┃     ┃      ┃            ┏━━━━━━━━━┻━━━━━━━━━┓         
┃  ┃     ┃      ┃            ┃                   ┃         
┃  ┃     ┃      ┃       ┏━━━━┻━━━━┓              ┃         
┃  ┃     ┃      ┃       ┃         ┃              ┃         
┃  ┃     ┃      ┃       ┃         ┃         ┏━━━━┻━━━━┓    
┃  ┃     ┃      ┃       ┃         ┃         ┃         ┃    
┃  ┃     ┃     ┏┻━┓     ┃         ┃         ┃         ┃    
┃  ┃     ┃     ┃  ┃     ┃     5 samples     ┃         ┃    
┃  ┃     ┃     ┃  ┃     ┃                   ┃         ┃    
┃  ┃     ┃     ┃  ┃     ┃                   ┃     4 samples
┃  ┃     ┃     ┃  ┃     ┃                   ┃              
┃  ┃     ┃     ┃  ┃     ┃               2 samples          
┃  ┃     ┃     ┃  ┃     ┃                                  
┃  ┃ 2 samples ┃  ┃     ┃                                  
┃  ┃           ┃  ┃     ┃                                  
┃  ┃           ┃  ┃ 3 samples                              
┃  ┃           ┃  ┃                                        
4 15           8 10

Jun 30 '22 22:06 hyanwong

This is a perfect candidate for a dynamic notebook widget, where you click a node to toggle summarisation. One day!

Jul 01 '22 10:07 benjeffery

Meanwhile, is this something that @savitakartik would like to tackle? I could help.

Jul 04 '22 09:07 hyanwong

Yes, I'm very interested in this issue and would love to work on it!

Jul 05 '22 10:07 savitakartik

Great. Shall we chat about it tomorrow, perhaps?

Jul 05 '22 12:07 hyanwong

I've just been discussing this with @savitakartik. One problem with the "edit the tree sequence" approach is that it might be hard to apply to an entire tree sequence, for instance, if an internal node should be collapsed in one tree in the ts, but uncollapsed in another.

Here's another idea: we could create a new function for iterating over the nodes of a tree in a tree sequence, that flags up whether a node is a "collapsed" node or not. Something like

class Tree
    def nodes_collapsed(order=None, return_hidden_nodes=None, collapse_method="time", ...):
       "Returns an iterator over the tuples (node_id, is_collapsed)"

Then we adjust the drawing routines to use tree.nodes_collapsed() instead of tree.nodes()

for (u, is_collapsed) in ts.first().nodes_collapsed():
   # use this in the plotting routines instead of tree.nodes()
   # is is_collapsed==True then plot labels using tree.samples(u)

This is more involved, but I think more flexible, and I can see how we could use it to implement a (naive) v version where we have SVG interactivity to hide and show clades in a tree (sequence) plot. I think we should meet with @jeromekelleher or @benjeffery to discuss the best approach here.

Aug 22 '22 11:08 hyanwong

On a quick glance I would agree with having the collapsing be done in the drawing code rather than in the tree sequence. This also allows you to do fancy things in future such as having a node icon that summarises the clade below (e.g. pie chart by population).

Aug 23 '22 12:08 benjeffery

Here's a fun thing: download this SVG and open it in a browser. It should allow you to hide and show subclades by clicking on them: tmp

@benjeffery can probably improve my poorly coded JS, which I added to the end of the SVG file.

      function toggle_child_node_visibility(evt) {
        children = evt.currentTarget.getElementsByClassName("node");
        for (var i = 0; i < children.length; i++) {
            if (children[i].style.visibility == "hidden") {
                children[i].style.visibility = null;
            } else {
                children[i].style.visibility = "hidden";
            }
        }
        evt.stopPropagation()
      }
      nodes = document.getElementsByClassName("node");
      for (var i = 0; i < nodes.length; i++) {
          nodes[i].addEventListener("click", toggle_child_node_visibility);
      }

Aug 23 '22 13:08 hyanwong

Cool! I'm guessing in the general case you'd want to expand/collapse the layout though?

Aug 23 '22 14:08 benjeffery

Cool! I'm guessing in the general case you'd want to expand/collapse the layout though?

Yes, I think usually you would want to squash up nodes on the x axis if they contain branches that have been collapsed, but doing it interactively would mean the node positions would hop about, so I was wondering if the interactive version might want an statically positioned option such as this.

Aug 23 '22 14:08 hyanwong

Yes, I think usually you would want to squash up nodes on the x axis

Just chiming in here to remind you that you can use link_ancestors to do this pushing up of nodes

Oct 21 '22 20:10 gtsambos

Yes, I think usually you would want to squash up nodes on the x axis

Just chiming in here to remind you that you can use link_ancestors to do this pushing up of nodes

Erm, I'm not sure I follow. Here we are talking about adjusting the X position of the nodes, I think?

Oct 21 '22 20:10 hyanwong

ah, sorry. I've only skimmed this thread -- when I saw this

One possibility is to collapse clades in a tree if e.g. all the nodes underneath belong to the same population

I assumed you meant you were going to find nodes whose descendants all have the same 'population' label, and 'simplify' the tree by getting rid of the intermediate edges between them and the leaves

Oct 21 '22 21:10 gtsambos

so it wouldn't necessarily help you do the plotting itself, but it could help to show you which nodes need to be collapsed together

Oct 21 '22 21:10 gtsambos

Right. Re viz, I know that the ETE developer has thought about interactive large tree viz in a conventional style and has some funding for it (e.g. demos with circular trees at https://www.youtube.com/watch?v=jnkuNrfx6iM).

Oct 21 '22 21:10 hyanwong

Following on from our discussion just now:

We want to reorganise the SvGDraw.assign_x_coordinates() method so that it doesn't require num_leaves to be calculated up top, but used at the end
We want to be able to specify the starting node(s) for tree drawing, e.g. tree.draw_svg(start_nodes=[1, 2]) where start_nodes=None means use tree.roots here: https://github.com/tskit-dev/tskit/blob/10a74b4df18641ff32c29b8d0b3632b094c6cbb9/python/tskit/drawing.py#L1448 . I'm not sure this makes huge amounts of sense to do for a entire tree sequence, through. Certainly not for a first pass anyway.
We want to be able to specify a cutoff for the number of lineages displayed. This can be done by carrying out a level order traversal until a certain number of lineages are surpassed, then marking all the nodes so far visited as to-be-drawn, and terminating early when drawing here: https://github.com/tskit-dev/tskit/blob/10a74b4df18641ff32c29b8d0b3632b094c6cbb9/python/tskit/drawing.py#L1450

Nov 01 '22 13:11 hyanwong

This paper has some discussion about identifying identical clades in two partially collapsed trees: https://academic.oup.com/mbe/article/33/8/2163/2579233?login=false

Jan 12 '23 11:01 hyanwong

tskit tskit copied to clipboard

Document visual ways of summarising subtrees (clades) when plotting

tskit
tskit copied to clipboard