C. Titus Brown
C. Titus Brown
e.g. see https://github.com/dib-lab/charcoal/issues/13#issuecomment-628718164 >637 of the 2000 MAGs I ran have 0 bp of contamination The average f_major was 95.1% The average contaminant contig is 10,731 bp (sd 16,838bp) long...
from a conversation with @taylorreiter - per 1(a) in https://github.com/dib-lab/charcoal/issues/87, charcoal starts by running gather against the provided databases. gather is fast and tremendously specific. however, because it is so...
[MetaSanity, An integrated microbial genome evaluation and annotation pipeline](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa512/5840471?guestAccessKey=337cb12b-5494-46c1-a992-754c2458b700) - for inspiration, and also maybe for test data sets.
e.g. for `SRR4033069_bin.1.*report.txt` we have a genome that is mostly unidentified. but we might see that some of these bits belong to other MAGs in this same collection. this is...
crunching numbers on contig-level stats in #73 I get -- strict: ContigInfo.NO_IDENT 120035 contigs / 599.6 Mbp ContigInfo.CLEAN 109778 contigs / 1193.3 Mbp ContigInfo.NO_HASH 6079 contigs / 19.0 Mbp ContigInfo.DIRTY...
numbers from contig-level stats in #73 -- ``` strict reasons: 3 2076 contigs / 14.6 Mbp 1 1355 contigs / 11.5 Mbp 2 196 contigs / 1.8 Mbp relaxed reasons:...
we are using GTDB taxonomy because - * GTDB provides a nice collection of representative genomes * GTDB provides species clusters based on ANI, which matches the approach used in...