notes for documentation - bigger databases => better?, impact of lateral gene transfer/phage
in theory, as we sequence more and more microbial genomes, charcoal should become better and better (balanced a bit by database size and the potential need to dereplicate through species clusters)
it's not clear to me that Reason 2 is a great idea based on challenges of lateral gene transfer and phage. I guess at the least it will highlight places people should check their genomes?
although note that reason 2 and 3 look at majority lineage, so the entire contig has to be questionable. hmm.
ah interesting note about majority lineage. This would/should still cause problems with plasmids and with small contigs that are dominated by phage or HGT.
I like the idea of saying "check your genomes." I sort of view the *dirty.fa.gz file as either 1) clear contaminants, or 2) contigs that need curation by the user and clear evidence to be re-added to the genome.