seasonal-flu
seasonal-flu copied to clipboard
Explore use of IQ-TREE's constraint tree option
@corneliusroemer experimented with IQ-TREE's constraint tree option to prevent IQ-TREE from putting clades in the wrong place for NextClade reference trees. This seems like a good way to ensure correct trees especially in 6m builds that may be lacking context sequences.
One issue brought up in initial Slack thread:
One issue to sort out would be delimiter in sequence names, right now IQtree renames all
/
as some weird string
This is how it's used right now in the SC2 reference tree workflow:
Simply add constraint tree file path after -g
to tree builder args:
https://github.com/neherlab/nextclade_data_workflows/blob/09be86c1718ffab2deed7060c3f7a70c135c530d/sars-cov-2/defaults/parameters.yaml#L22
And that's the hand coded tree: https://github.com/neherlab/nextclade_data_workflows/blob/feat/gisaid-v2/sars-cov-2/defaults/constraint.nwk
In the flu case, there are two options:
- Either you get (synthetic) prototypical sequences for each clade with constant names, like
2A
,2A.1
etc. (similar to the SC2 workflow) and hand code a short Newick tree with the right topology - Or you generate a constraint tree using actual sequence names based on the topology as revealed through clade-hierarchies or handcoded in a newick tree that's read in by Biopython.Phylo
Both approaches should work.