charcoal icon indicating copy to clipboard operation
charcoal copied to clipboard

Results of a full evaluation of GTDB 25k release89 genomes

Open ctb opened this issue 4 years ago • 2 comments

per #121 and using #122, I ran a (DNA-focused) contamination evaluation on all 25k genomes from the fastani collection of genomes in release 89.

Only 302 genomes had any suspected contamination at all. I attach that list as a .csv.txt file.

note that I turned off LCA-style evaluation here, so the only reasons for contig removal are reason 1, gather-based.

The parameters are a bit too stringent, I think , so I'm working on that. But this is a first pass.

gtdb-25k-contam.csv.txt

ctb avatar Jul 10 '20 14:07 ctb

updated! only ~240 genomes with any cross-kingdom contamination.

gtdb-random-dna.combined_summary.rm.csv.txt

ctb avatar Jul 12 '20 01:07 ctb

so that's ...what... 1% of genomes with some detectable contamination.

ctb avatar Jul 12 '20 01:07 ctb