BioFSharp icon indicating copy to clipboard operation
BioFSharp copied to clipboard

[BUG] Pairwise alignment does not correctly handle gap opening costs

Open HLWeil opened this issue 2 years ago • 0 comments

Describe the bug Pairwise alignment does not correctly score gap opening. Instead of making a low cost, big gap it tends to make smaller gaps where single positions match.

To Reproduce

let s1 = "NLFVAAAAQTKNGQGWVPSNYITPVNSAAA" |> BioArray.ofAminoAcidSymbolString
let s2 = "NLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS" |> BioArray.ofAminoAcidSymbolString

let aaCosts = {
    Open = -15000 // Notice exremely high gap opening cost but normal gap continuation cost
    Continuation = -2
    Similarity = (fun a b -> if a = b then 2 else -3) //aaScoring 
    }

let a = NeedlemanWunsch.runAminoAcidSymbol aaCosts s1 s2

Expected behavior

One big gap, as gap continuation should be favoured over opening new gaps in every case.

Actual behavior

val a: Alignment.Alignment<AminoAcidSymbols.AminoAcidSymbol list,Score> =
  { MetaData = -15114
    Sequences =
     [[N; L; F; V; A; -; -; -; -; -; A; A; A; Q; -; -; -; -; -; T; -; -; -; K;
       -; -; -; -; -; -; -; -; N; G; -; -; -; -; -; Q; G; W; V; P; S; N; Y; I;
       T; P; V; N; S; A; A; A; -; -];
      [N; L; F; V; A; L; Y; D; F; V; A; S; G; D; N; T; L; S; I; T; K; G; E; K;
       L; R; V; L; G; Y; N; H; N; G; E; W; C; E; A; Q; T; K; N; G; Q; G; W; V;
       P; S; N; Y; I; T; P; V; N; S]] }

Many small gaps. As can be seen by the score. Only one gap opening cost was counted. Also notice that gaps are caused by single matching positions.

HLWeil avatar Mar 23 '22 18:03 HLWeil