IQ-TREE icon indicating copy to clipboard operation
IQ-TREE copied to clipboard

identical sequences not getting brlen = 0

Open tseemann opened this issue 3 years ago • 2 comments

If you give iqtree 2.1.0 identical sequences Tjeu end up with a BL of 0.000001 I think thjis is a bug THis problem persisits even when --polyomy is used ALso, reducing --blmin seems to also reduce that BL too But it can't be set to zero

tseemann avatar Aug 21 '20 02:08 tseemann

Following on from this, the logic for collapsing identical sequences means that if there are N identical sequences, N-1 are collapsed into a single entity (with branch length 0 between them) and 1 remains distinct. Given this very artificial alignment:

>outgroup
CCCCGTGAGCCCGGTAGGCCGTCGGATGCTTCCCGCCCGGCGCGCCGTCCGCCACTCGGT
CGCACGCCCGGCCGGCCCCTAATGTTCGGCCACACCGAGCGGGCGAGAGGGGTGACTCGG
>copy
CCCCGGGAGCCCGGTAGGCCGTCGGATGCGTCCCGCCCGGCGCGCCGTCCGCCACTCGGT
CGCACGCCCGGCCGGCCCCTAATGTTCGGCCACACCGAGCGGGCGAGAGGGGTGACTCGG
>copy2
CCCCGGGAGCCCGGTAGGCCGTCGGATGCGTCCCGCCCGGCGCGCCGTCCGCCACTCGGT
CGCACGCCCGGCCGGCCCCTAATGTTCGGCCACACCGAGCGGGCGAGAGGGGTGACTCGG
>copy3
CCCCGGGAGCCCGGTAGGCCGTCGGATGCGTCCCGCCCGGCGCGCCGTCCGCCACTCGGT
CGCACGCCCGGCCGGCCCCTAATGTTCGGCCACACCGAGCGGGCGAGAGGGGTGACTCGG

the following tree is produced:

((outgroup:0.0168863000,(copy:0.0000000000,copy3:0.0000000000):0.0000010000,copy2:0.0000010000);

(btw raxml-ng produces this tree:

((copy3:0.000001,copy:0.000001):0.000001,copy2:0.000001,outgroup:0.017495);

)

For both programs -blmin was kept at its default of 1e-6. This behaviour form iqtree can lead to some very surprising trees in very closely related sequences (as is common in these days of SARS-CoV-2).

pvanheus avatar Oct 13 '20 18:10 pvanheus

btw to address the zero branch length issue I noted above (an independent question from the shape of the tree), insertTaxa() could be altered to use min_branch_length rather than 0.0. I don't know if that is the 'correct' way to deal with this, or if all identical sequences should all cluster together with branch length 0. There's no actual data for the algorithm to work with here so I think the word correct might be a misnomer.

Linking to some forum posts that discuss this issue one, two and three. This last one discusses the --polytomy flag that Torsten mentions above.

pvanheus avatar Oct 13 '20 19:10 pvanheus