C. Titus Brown

Results 518 issues of C. Titus Brown

we could easily do a taxonomic breakdown of _all_ the dirty sequence data after a charcoal run. I wonder if it would reveal anything interesting?

per #121 and using #122, I ran a (DNA-focused) contamination evaluation on all 25k genomes from the fastani collection of genomes in release 89. Only 302 genomes had any suspected...

evaluation

it would be nice to be able to decontaminate GTDB itself, but one of the problems we face there is that charcoal doesn't work well in situations where we have...

this is not a "soon" issue, but there appears to be substantial opportunity for using amino acid k-mers to find contamination... e.g. https://github.com/bluegenes/2020-gtdb-smash/issues/1

futurewish

e.g. "lineage (different from genome lineage at XXX)" would be more informative than what we're currently doing.

order level decontam. see [notebook](https://github.com/dib-lab/charcoal/blob/master/eval/almeida-eval.ipynb) big results: ``` contig info: ContigInfo.CLEAN 223120 contigs / 4983.3 Mbp ContigInfo.NO_IDENT 15797 contigs / 82.5 Mbp ContigInfo.NO_HASH 4667 contigs / 13.1 Mbp ContigInfo.DIRTY 2947...

evaluation

in `LoombaR_2017__SID1050_bax__bin.11.fa.gz`, we see: ``` breakdown of clean contigs w/gather: 75.29% - to GCF_002159845 s__Anaeromassilibacillus sp002159845 1.56% - to GCF_900104675 s__Angelakisella massiliensis 1.42% - to GCF_002160955 s__Gemmiger_A sp002160955 1.28% -...

https://github.com/dib-lab/charcoal/issues/33#issuecomment-641345742 also viz [MetaPalette approach](https://msystems.asm.org/content/1/3/e00020-16)

so @bluegenes did a nice thing with orthodb (see [comment here](https://github.com/dib-lab/charcoal/issues/30#issuecomment-631142093)), and ran charcoal on 10 euk and 10 bac from orthodb. first, it looks like there is some real...

evaluation