ncov icon indicating copy to clipboard operation
ncov copied to clipboard

Remove redundant include strains

Open huddlej opened this issue 2 years ago • 1 comments

Description of proposed changes

Removes older references from the list of strains to force include in analyses. We use the single Wuhan/Hu-1/2019 strains as the reference for alignment and time tree rooting, so the other two strains are not required. Importantly, these additional strains get used in proximity calculations with priority-based subsampling leading to the unexpected inclusion of contextual strains that look like these redundant root sequences. We try to avoid this problem in the proximity calculations by defining a list of strains to ignore, but this list only includes the strain used to root the time tree. Rather than updating this list to include more strains, we can just remove the strains from the include file.

Testing

  • [x] Tested by CI
  • [x] Tested by Cassia

huddlej avatar Jan 25 '22 19:01 huddlej

@rneher Does this look ok to you, too? This bit of the workflow is just complicated enough that I can imagine I'm missing something with this change...

huddlej avatar Jan 26 '22 00:01 huddlej