[BUG] The number of splits in a bin containing all splits does not equal the number of items from the interactive interface
Short description of the problem
The number of splits in a bin containing all splits in the interactive interface does not equal the number of items of misc-data.
anvi'o version
$ anvi-self-test --version
Anvi'o .......................................: marie (v8-dev)
Python .......................................: 3.10.13
Profile database .............................: 40
Contigs database .............................: 24
Pan database .................................: 21
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2
System info
macOS Sonoma 14.6.1
Detailed description of the issue
Hi anvi'o community! In my analyses, I use bins to group items that have NA misc-data with their surround splits to pass along information. Unfortunately, I began noticing discrepancies in the number of items of misc-data vs a number of total splits in a collection of all splits i.e when I export a collection containing all splits from a profile-db it does not equal the number of items in the misc-data.
Here is an example with a bin with everything:
It has 8,641 splits:
You can reproduce the above here:
cd TEST/
anvi-interactive
and load the collection: IQtree_test_all_bin (it's a big interface and may take a second to load)
However, the number of leaves of the tree and the number of items in the misc-data do not match:
anvi-export-misc-data -p PROFILE.db --target-data-table items -o items.txt
$ wc -l items.txt
8683 items.txt
8683 without the headers
Furthermore, the tree in the interface has the same number of leaves at items misc data:
library("ape")
> read.tree("Ribosomal_L14-AA_subset_remove_long_seqs_aligned_maxiters_2_trimmed_filtered_IQTREE_ultrafast_bootstrap.contree)
Phylogenetic tree with 8682 tips and 8678 internal nodes.
Tip labels:
TARA_SAMEA4397472_METAG_Ribosomal_L14_000000000033, TARA_SAMEA4397472_METAG_Ribosomal_L14_000000000095, TARA_SAMEA2623059_METAG_Ribosomal_L14_000000000001, TARA_SAMEA4397930_METAG_Ribosomal_L14_000000000016, TARA_SAMEA2620970_METAG_Ribosomal_L14_000000000095, BGEO_SAMN07136678_METAG_Ribosomal_L14_000000000018, ...
Node labels:
, 100, 89, 56, 14, 30, ...
Unrooted; includes branch lengths.
8682 tree tips
I started chatting with @metehaansever about this issue last week but here is the formal documentation of the bug. Thanks in advance for the help and support.
Files / commands to reproduce the issue
https://uchicago.box.com/s/ggg4xso3qxrvdjphsyx006ay1uuzvjcd
Not a bug from the interface, but there is something wrong with the tree called Rooted final A. That tree contains less items than what is present in the contigs.db.
# export the bad tree and a good one
$ anvi-export-items-order -p PROFILE.db -o Rooted_final_A.txt --name Rooted_final_A
$ anvi-export-items-order -p PROFILE.db -o IQTree.txt --name IQTree
#count num of item (they all contain 'split')
$ grep -o "split" Rooted_final_A.txt | wc -l
8641
$ grep -o "split" IQTree.txt | wc -l
8682
How is that possible to have a tree with less leafs than items in a profile.db, I have no idea. If you try to re-import the bad tree, anvi'o complains:
$ anvi-import-items-order -p PROFILE.db -i Rooted_final_A.txt --name toto
Target database ..............................: PROFILE.db
Database type ................................: profile
Order file path ..............................: Rooted_final_A.txt
Order data type ..............................: newick
Order name ...................................: toto
Config Error: Ehem. There is something wrong with the incoming items order data here :/
Basically, the names found in your input data do not match to the item names
found in the database. For example, this item
"BATS_SAMN08390924_METAG_Ribosomal_L14_000000000108_split_00001" is in your
database, but not in your input data
Thanks for diving in @FlorianTrigodet! I made a reproducible example with the files attached above. At step 3 is exactly where the number of items in the EVERYTHING bin change from 8,862 → 8,641. It has to do with rotating the tree.
Step 1:
Change items order to IQTree
correct number of leaves
Step 2:
Root here:
correct number of leaves
Step 3:
Rotate here:
Mysteriously 221 leaves disappear :(