conterminator
conterminator copied to clipboard
GTDB
Hi,
I would love to clean human contaminated sequences from the GTDB bacteria and archaea (r95) and NCBI viruses and fungi, as classifications are being badly affected in some samples of mine with high human DNA proportion. I already have a concatenated .faa
file for kraken, and a seqid2taxid.map
file. However, because it is a custom-built database, and incorporates GTDB, the taxids bear no relation to NCBI IDs. I have a names.dmp
and nodes.dmp
file.
Could I tweak conterminator to process this database? It is a 120 Gb sequence database. I can't see how much RAM is required, but naively following the idea of linear time, I would hope I could process my database in under a day.
Best wishes,
Andrew