RabbitTClust icon indicating copy to clipboard operation
RabbitTClust copied to clipboard

missing tips in newick tree

Open Djeppschmidt opened this issue 1 year ago • 3 comments

Hello,

I'm really appreciative of the newick format that you recently introduced!

I think this is a bug in building the tree. As I'm working with the newick file, it appears the newick tree is missing internal nodes; rather about half the nodes are labeled with the names that should actually be tips on the tree. For example, I ran rabbitTclust to cluster all salmonella in the NCBI pathogen database (~500k isolates) using the following code:

clust-mst -d 0.001 -l -i fasta_input.txt --newick-tree -o sal.mst.clust.0001

I generate a tree with ~270k tips, and ~238k nodes (it should have ~500k tips).

I ran a tiny version of this with 8 isolates, which produced 3 tips, and 5 internal nodes:

(((/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863221_contigs_skesa.fasta:0.000794,(/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863395_contigs_skesa.fasta:0.016157)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR900926_contigs_skesa.fasta:0.000969,(/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863393_contigs_skesa.fasta:0.001294)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863392_contigs_skesa.fasta:0.013981)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863223_contigs_skesa.fasta:0.000000)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863224_contigs_skesa.fasta:0.020389)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863396_contigs_skesa.fasta;

This makes it impossible to filter the tree by tips because half the isolates are actually node labels, when I believe they should be tip labels.

I'm curious if anyone else is experiencing this issue? Or maybe I'm missing something?

Thanks for you help, Dietrich

Djeppschmidt avatar Jul 13 '23 18:07 Djeppschmidt