nextclade icon indicating copy to clipboard operation
nextclade copied to clipboard

DO NOT MERGE: don't award matches with unknown state the full match score

Open rneher opened this issue 2 years ago • 1 comments

We currently award a match score_match (3) and penalize a mismatch with mismatch_penalty (1). Ambiguous characters (N) match every other character and are thus always award the full score. In some situations, this is undesirable. Of the following, the second one would be the more sensible alignment (pair C with C and treat N as insertion):

AT-CCTCC
ATCCNTCC
ATCC-TCC
ATCCNTCC

But preferring non-ambig matches can also have unexpected consequences. we might have additional indels when the score difference is too high. if we wanted to change this behavior, it is thus best to keep the ambiguous match score as close to the full match score as possible.

On a related note, our match/mismatch parametrization is not completely redundant because they are compared to the terminal gaps are not penalized and thus implicitly score 0.

rneher avatar Jun 11 '22 15:06 rneher

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
nextclade ✅ Ready (Inspect) Visit Preview Jun 11, 2022 at 3:06PM (UTC)

vercel[bot] avatar Jun 11 '22 15:06 vercel[bot]