auspice icon indicating copy to clipboard operation
auspice copied to clipboard

Make genotype filtering consistent with other traits

Open trvrb opened this issue 3 years ago • 2 comments

Current Behavior
Currently, genotype filtering behaves differently from other filter types. This can be immediately seen by comparing filtering to clade and filtering to genotype.

Here is filtering to clades A1b/197R and A1b/94N for influenza H3N2: https://nextstrain.org/flu/seasonal/h3n2/ha/2y?f_clade_membership=A1b/197R,A1b/94N

clade-filter

This is my expectation for how all filters should work (except for date filter, which can be a special case). Tips belonging to clades A1b/197R or A1b/94N are selected as specified in the byline as "showing 471 of 1513 genomes sampled" and then the subtree leading to these tips is highlighted in view. This has the nice property that we have a single subtree and this should work as intended for downloading Newick or Nexus.

However, the new genotype filter behaves quite differently. Here is filtering to nodes with genotype 198P: https://nextstrain.org/flu/seasonal/h3n2/ha/2y?c=gt-HA1_198&gt=HA1.198P

genotype-filter

Notice that this works very differently from the clade filter. In this case there is a direct filtering to nodes with 198P.

Expected behavior
I don't see how we can justify the tip + subtree filter for clades and the direct node filter for genotypes. Nodes are labeled based on clade_membership just as nodes are labeled based on genotype. I find it surprising that genotype filters should be behave so differently here. Note that region filters (https://nextstrain.org/flu/seasonal/h3n2/ha/2y?c=region&f_region=China), etc... all function like the clade filter where tips are selected and their subtree highlighted.

I'd suggest to streamline genotype filtering to work exactly like clade filtering where filtering applies to tips rather than nodes.

In practice for ncov this would make the genotype filter to 484K,501Y (https://nextstrain.org/ncov/global?gt=S.484K,501Y):

ncov-genotypes

function like the clade filter to 501Y.V2 and 501Y.V3 (https://nextstrain.org/ncov/global?f_clade_membership=20H/501Y.V2,20J/501Y.V3):

ncov-clades

I also see this subtree strategy as being significantly more compatible with algorithms to do an "accordion zoom".

What do you think @jameshadfield?

trvrb avatar Jan 30 '21 20:01 trvrb

It's true that there is inconsistency in filtering at the moment, however I believe that the current implementation of genotype filtering is "more truthful" and that the problems lie with our filtering algorithm for "normal" filters. I wrote about this in #1275.

For situations where the internal nodes are annotated (e.g. genotype, clade_membership) I believe the "most truthful" behavior is that which we currently do for genotypes. In other words, I think the current behavior for clade_membership (your second screenshot) is wrong (see #1275 for more examples of this). If this were not to be the case, then for the HA1:198P example, we would make the root visible, which will almost certainly be interpreted to mean that the ancestral state of the tree is HA1:198P and it has been lost many times.

There may be different interpretations of "filtering": are we conveying the evolutionary paths through the tree to get to a filtered set of tips, or are we conveying which parts of the tree match a specific query? We have been doing the former, but where possible I believe we should do the latter.

jameshadfield avatar Jan 31 '21 23:01 jameshadfield

which will almost certainly be interpreted to mean that the ancestral state of the tree is HA1:198P and it has been lost many times.

I'm afraid I don't agree here. If https://nextstrain.org/flu/seasonal/h3n2/ha/2y?c=gt-HA1_198&gt=HA1.198P had "tip filtering" it would show the same 231 tips, but would also show the entire subtree that connects them. Because the coloring is by site HA1 198 anyway this would show that the ancestral state is S and there were multiple transitions from S to P on the tree. There would be green S branches, yellow P branches and yellow P tips.

trvrb avatar Feb 01 '21 05:02 trvrb