GTDBTk icon indicating copy to clipboard operation
GTDBTk copied to clipboard

How to get a tree containing all my MAGs

Open Sh1von opened this issue 1 year ago • 1 comments

Hello, I noticed that "GTDB-TK now uses a divide-and-conquer approach where the bacterial reference tree is split into multiple order-level subtrees." I would like to obtain a tree that includes all of my MAGs, however, I don't have 320G of memory to run --full-tree, although I did get multiple tree files by using gtdbtk classfy-wf with --mash_db. Strangely, some MAGs (corresponding to their fastani_id) do not exist in any of the trees. I inputted 50 MAGs FASTA files, but strangely, the gtdbtk.bac120.user_msa.fasta.gz file only includes two MAGs. Additionally, when I check the log, it mentions "48 genome(s) have been classified using the ANI pre-screening step." and coincidentally, these are the two genomes needed for "Identifying markers in 2 genome."

I have also uploaded the log file. In summary, I hope to obtain a tree that includes all of my MAGs without running the --full-tree parameter. gtdbtk.log

Sh1von avatar Jan 12 '24 03:01 Sh1von

Hello, In order to have all the genomes placed in the reference tree ( split or not split ) you need to use the flag --skip_ani_screen . it will skip the first screening. Unfortunately, by default you will get multiple sub trees for the placement of your genomes. There is no way around the 320G requirement for the --full_tree option.

Cheers, Pierre

pchaumeil avatar Apr 11 '24 05:04 pchaumeil