sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

store `is_translated` metadata in minhash

Open bluegenes opened this issue 1 year ago • 0 comments

6-frame translated sketches are useful for searching protein databases, but also come with a few complications.

  • translated signatures should not be compared to each other (containment, jaccard, ANI will likely be incorrect/less useful) #2010
  • translated gather --> protein database will report incorrect % classified (#1087)

We could at least keep track of whether or not the sketch was generated via translation in the Minhash object, perhaps as is_translated property (ref #268). This would enable us to warn users about potential issues, or change behavior if desired.

first suggested here: https://github.com/sourmash-bio/sourmash/issues/2010#issuecomment-1116516469

bluegenes avatar Aug 17 '22 01:08 bluegenes