BioFSharp
BioFSharp copied to clipboard
[BUG] Pairwise alignment does not correctly handle gap opening costs
Describe the bug Pairwise alignment does not correctly score gap opening. Instead of making a low cost, big gap it tends to make smaller gaps where single positions match.
To Reproduce
let s1 = "NLFVAAAAQTKNGQGWVPSNYITPVNSAAA" |> BioArray.ofAminoAcidSymbolString
let s2 = "NLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS" |> BioArray.ofAminoAcidSymbolString
let aaCosts = {
Open = -15000 // Notice exremely high gap opening cost but normal gap continuation cost
Continuation = -2
Similarity = (fun a b -> if a = b then 2 else -3) //aaScoring
}
let a = NeedlemanWunsch.runAminoAcidSymbol aaCosts s1 s2
Expected behavior
One big gap, as gap continuation should be favoured over opening new gaps in every case.
Actual behavior
val a: Alignment.Alignment<AminoAcidSymbols.AminoAcidSymbol list,Score> =
{ MetaData = -15114
Sequences =
[[N; L; F; V; A; -; -; -; -; -; A; A; A; Q; -; -; -; -; -; T; -; -; -; K;
-; -; -; -; -; -; -; -; N; G; -; -; -; -; -; Q; G; W; V; P; S; N; Y; I;
T; P; V; N; S; A; A; A; -; -];
[N; L; F; V; A; L; Y; D; F; V; A; S; G; D; N; T; L; S; I; T; K; G; E; K;
L; R; V; L; G; Y; N; H; N; G; E; W; C; E; A; Q; T; K; N; G; Q; G; W; V;
P; S; N; Y; I; T; P; V; N; S]] }
Many small gaps. As can be seen by the score. Only one gap opening cost was counted. Also notice that gaps are caused by single matching positions.