charcoal
charcoal copied to clipboard
explore protein-based decontamination
this is not a "soon" issue, but there appears to be substantial opportunity for using amino acid k-mers to find contamination...
e.g. https://github.com/bluegenes/2020-gtdb-smash/issues/1
trying this out now @bluegenes request, over in #120
if we're serious about this, should probably plan on running prokka to extract proteins. or maybe six-frame translation of DNA is better, b/c could catch fragmented genes w/o reducing specificity?