BioAlignments.jl icon indicating copy to clipboard operation
BioAlignments.jl copied to clipboard

Alignment of 2 LongAminoAcidSeq with AffineGapScoreModel from docs goes out of bounds

Open kool7d opened this issue 2 years ago • 1 comments

Affine alignment is not working, but simple edit distance works.

strucseq = "KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL"

msaseq = "KIFERCEFARTLKRNGMGGYHGIRLADWVCLARWESSYNTKATNYNSKSTDYGIFQINSRYWCNDGKTPGAVNACGISCNVLLQDDITQAIACAKRVVDPQGIRAWVAWKKHCEQDLTQYQGC"

Expected Behavior

Something like

costmodel = CostModel(match=0, mismatch=1, insertion=1, deletion=1)
alignment = pairalign(EditDistance(), strucseq, msaseq, costmodel)

provides

PairwiseAlignmentResult{Int64, LongAminoAcidSeq, LongAminoAcidSeq}:
  distance: 62
  seq:   1 KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINS  60
           | | ||| |   || |   | |  |  ||| |  ||  || ||| |   |||||| ||||
  ref:   1 KIFERCEFARTLKRNGMGGYHGIRLADWVCLARWESSYNTKATNYN-SKSTDYGIFQINS  59

  seq:  61 RWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDV 120
           | ||||| |||  | | | |  ||  |||    || | | |  |  |||||   |   |
  ref:  60 RYWCNDGKTPGAVNACGISCNVLLQDDITQAIACA-KRVVDPQGIRAWVAWKKHC-EQD- 116

  seq: 121 QAWIRGCRL 129
                ||
  ref: 117 LTQYQGC-- 123

Current Behavior

costmodel = AffineGapScoreModel(BLOSUM62, gap_open=-10, gap_extend=-1)
alignment = pairalign(GlobalAlignment(), strucseq, msaseq, costmodel)

gives

ERROR: BoundsError: attempt to access 27×27 Matrix{Int64} at index [12, 28]
Stacktrace:
 [1] getindex
   @ .\array.jl:862 [inlined]
 [2] getindex
   @ C:\Users\kool7\.julia\packages\BioAlignments\t4D8A\src\submat.jl:82 [inlined]
 [3] run!(nw::BioAlignments.NeedlemanWunsch{Int64}, a::LongAminoAcidSeq, b::LongAminoAcidSeq, submat::SubstitutionMatrix{AminoAcid, Int64}, start_gap_open_a::Int64, start_gap_extend_a::Int64, middle_gap_open_a::Int64, middle_gap_extend_a::Int64, end_gap_open_a::Int64, end_gap_extend_a::Int64, start_gap_open_b::Int64, start_gap_extend_b::Int64, middle_gap_open_b::Int64, middle_gap_extend_b::Int64, end_gap_open_b::Int64, end_gap_extend_b::Int64)
   @ BioAlignments C:\Users\kool7\.julia\packages\BioAlignments\t4D8A\src\pairwise\algorithms\needleman_wunsch.jl:41
 [4] pairalign(::OverlapAlignment, a::LongAminoAcidSeq, b::LongAminoAcidSeq, score::AffineGapScoreModel{Int64}; score_only::Bool)
   @ BioAlignments C:\Users\kool7\.julia\packages\BioAlignments\t4D8A\src\pairwise\pairalign.jl:137
 [5] pairalign(::OverlapAlignment, a::LongAminoAcidSeq, b::LongAminoAcidSeq, score::AffineGapScoreModel{Int64})
   @ BioAlignments C:\Users\kool7\.julia\packages\BioAlignments\t4D8A\src\pairwise\pairalign.jl:134
 [6] top-level scope
   @ c:\Users\kool7\Google Drive\BioMakie.jl\_research\mitos2.jl:66

kool7d avatar May 26 '22 03:05 kool7d

I can't reproduce this. @kool7d Can you verify that you can reproduce it? Based on the stacktrace, I suspect this is a duplicate of #46 .

jakobnissen avatar Jun 15 '22 13:06 jakobnissen