RabbitTClust
RabbitTClust copied to clipboard
missing tips in newick tree
Hello,
I'm really appreciative of the newick format that you recently introduced!
I think this is a bug in building the tree. As I'm working with the newick file, it appears the newick tree is missing internal nodes; rather about half the nodes are labeled with the names that should actually be tips on the tree. For example, I ran rabbitTclust to cluster all salmonella in the NCBI pathogen database (~500k isolates) using the following code:
clust-mst -d 0.001 -l -i fasta_input.txt --newick-tree -o sal.mst.clust.0001
I generate a tree with ~270k tips, and ~238k nodes (it should have ~500k tips).
I ran a tiny version of this with 8 isolates, which produced 3 tips, and 5 internal nodes:
(((/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863221_contigs_skesa.fasta:0.000794,(/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863395_contigs_skesa.fasta:0.016157)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR900926_contigs_skesa.fasta:0.000969,(/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863393_contigs_skesa.fasta:0.001294)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863392_contigs_skesa.fasta:0.013981)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863223_contigs_skesa.fasta:0.000000)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863224_contigs_skesa.fasta:0.020389)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863396_contigs_skesa.fasta;
This makes it impossible to filter the tree by tips because half the isolates are actually node labels, when I believe they should be tip labels.
I'm curious if anyone else is experiencing this issue? Or maybe I'm missing something?
Thanks for you help, Dietrich