ncov icon indicating copy to clipboard operation
ncov copied to clipboard

expand definition of 20C/S.501T

Open dpark01 opened this issue 4 years ago • 6 comments

Description of proposed changes

Current definition of 20C/S.501T excludes all American S:501T samples from 20C. This change expands the definition to a monophyletic subclade of 20C that is observed to contain the previous 20C/S.501T within it, but also a much larger number of American 501Ts that appear to be related.

Many of the S:501T human-associated viruses in the US appear to share three of the four SNPs in the current subclades.tsv definition for 20C/S.501T (nuc 12850G, 21364T, and 23064C). However the first SNP in the definition, 11417T, appears to be too restrictive. This PR suggests changing it to a more ancestral SNP, given the currently observed phylogeny. An alternative might be to remove it altogether and just stick with three SNPs.

Related issue(s)

None

Testing

  • Current Nextstrain North America view: https://nextstrain.org/ncov/north-america?c=gt-S_501&f_clade_membership=20C&f_country=USA&f_host=Human&gt=S.501T
  • New England weighted view: https://auspice.broadinstitute.org/sars-cov-2/neusa/20210209?c=gt-S_501&f_Nextstrain_clade=20C&f_country=USA&p=grid

dpark01 avatar Feb 11 '21 01:02 dpark01

Thanks for this @dpark01 ! I spotted this a little while back but it complete fell off my plate and I never did anything about it. I've had a look at the tree and I agree this would be a more sensible place to draw this line.

For anyone else taking a look at this PR, here's a zoom in of 20C (in the focal 501 build) with the current 20C/S:501T marked in the middle, colored by the proposed base change 3231 to show how the clade would expand: https://nextstrain.org/groups/neherlab/ncov/S.N501?c=gt-nuc_3231&label=clade:20C&m=div

image

emmahodcroft avatar Feb 11 '21 10:02 emmahodcroft

@emmahodcroft actually your build kind of argues for removing this SNP altogether and sticking with the other three. From your screenshot you can see some nearby neighbors that missed the 3231T coloring, and if we recolor your tree on 501T genotype, it suggests 23064C as the ancestral SNP (which is already in the list as one of the other three we'd want to keep).

Thoughts? I can edit this PR if we want to go from four to three. I can phylogenetically see arguments either way.

dpark01 avatar Feb 11 '21 14:02 dpark01

Okay, one argument in favor of the new four-SNP definition (which is what this PR currently does) instead of a more permissive three-SNP definition is that, looking at Emma's build, PANGOLIN seems to be calling the four-SNP definition B.1.517, whereas the other nearby 501T samples are getting other assignments.

dpark01 avatar Feb 12 '21 02:02 dpark01

You raise a good point here Danny, but looking at the tree I think it might be worth distinguishing the 'top' of that group, as defined using the 3231, separate from the bottom, which includes a mink outbreak in the USA that appears to be over. I think it's just the 'top' that's expanding now and remains relevant.

I'd also be cautious in interpreting that this (larger) group really is due to one arising of 501 - because these are 501-focused trees the ancestral reconstruction can pull together sequences that share 501, but where it arose separately, and give them that as a common ancestor! When groups share multiple mutations this is stronger evidence that they really are part of a monophyletic group (like with the 3231 mutation - that has other mutations in some ORFs that group them together).

So considering all that I think I'd vote for the change to 3231!

emmahodcroft avatar Feb 12 '21 12:02 emmahodcroft

@emmahodcroft, is there any value to merging this now (after resolving conflicts) or is it too late to matter much? (I'm trying to clean up old PRs.)

huddlej avatar Nov 11 '21 00:11 huddlej

I think this is likely so old it's not relevant anymore, but if we wanted to be complete, I'd go by what old-Emma says, with voting to change to 3231. I've long forgotten the details but I trust old-Emma. Otherwise, we can just let it lapse.

emmahodcroft avatar Nov 11 '21 10:11 emmahodcroft