GTDBTk
GTDBTk copied to clipboard
How to get a tree containing all my MAGs
Hello, I noticed that "GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple order-level subtrees." I would like to obtain a tree that includes all of my MAGs, however, I don't have 320G of memory to run --full-tree, although I did get multiple tree files by using gtdbtk classfy-wf with --mash_db. Strangely, some MAGs (corresponding to their fastani_id) do not exist in any of the trees. I inputted 50 MAGs FASTA files, but strangely, the gtdbtk.bac120.user_msa.fasta.gz file only includes two MAGs. Additionally, when I check the log, it mentions "48 genome(s) have been classified using the ANI pre-screening step." and coincidentally, these are the two genomes needed for "Identifying markers in 2 genome."
I have also uploaded the log file. In summary, I hope to obtain a tree that includes all of my MAGs without running the --full-tree parameter. gtdbtk.log
Hello,
In order to have all the genomes placed in the reference tree ( split or not split ) you need to use the flag --skip_ani_screen
. it will skip the first screening.
Unfortunately, by default you will get multiple sub trees for the placement of your genomes. There is no way around the 320G requirement for the --full_tree option.
Cheers, Pierre