seasonal-flu icon indicating copy to clipboard operation
seasonal-flu copied to clipboard

Explore use of IQ-TREE's constraint tree option

Open joverlee521 opened this issue 2 years ago • 1 comments

@corneliusroemer experimented with IQ-TREE's constraint tree option to prevent IQ-TREE from putting clades in the wrong place for NextClade reference trees. This seems like a good way to ensure correct trees especially in 6m builds that may be lacking context sequences.

One issue brought up in initial Slack thread:

One issue to sort out would be delimiter in sequence names, right now IQtree renames all / as some weird string

joverlee521 avatar Mar 23 '22 23:03 joverlee521

This is how it's used right now in the SC2 reference tree workflow:

Simply add constraint tree file path after -g to tree builder args: https://github.com/neherlab/nextclade_data_workflows/blob/09be86c1718ffab2deed7060c3f7a70c135c530d/sars-cov-2/defaults/parameters.yaml#L22

And that's the hand coded tree: https://github.com/neherlab/nextclade_data_workflows/blob/feat/gisaid-v2/sars-cov-2/defaults/constraint.nwk

In the flu case, there are two options:

  1. Either you get (synthetic) prototypical sequences for each clade with constant names, like 2A, 2A.1 etc. (similar to the SC2 workflow) and hand code a short Newick tree with the right topology
  2. Or you generate a constraint tree using actual sequence names based on the topology as revealed through clade-hierarchies or handcoded in a newick tree that's read in by Biopython.Phylo

Both approaches should work.

corneliusroemer avatar Mar 23 '22 23:03 corneliusroemer