charcoal icon indicating copy to clipboard operation
charcoal copied to clipboard

what database(s) do we want to use for charcoal?

Open ctb opened this issue 5 years ago • 1 comments

GTDB 25k is all well and good, but probably not as sensitive as all of genbank.

could we / should we build a "screened" genbank where we include any genome in genbank that has no significant (sourmash gather) matches in GTDB? would be fairly straightforward to do.

ctb avatar May 27 '20 14:05 ctb

If we do, should we run checkm/gtdbtk on them as well to estimate contamination? Or rather, should we do any sort of further curation?

I guess we could have 3 databases: gtdb25k, gtdb140k, gtdb140k + (genbank - gtdb). The documentation could have "buyer beware" for the third one

taylorreiter avatar May 27 '20 15:05 taylorreiter