Biostrings icon indicating copy to clipboard operation
Biostrings copied to clipboard

trimLRPatterns with indels not trimming as intended in case of net insertions or deletions between subject and pattern?

Open jan-glx opened this issue 2 years ago • 2 comments

In the following examples I'd expect trimLRPatterns to always leave ATCG:

library(Biostrings)
subject <- DNAStringSet(c("AAATCTCTCATCG"))
trimLRPatterns(subject = subject, Lpattern = "AAATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AAAATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     3 TCG
trimLRPatterns(subject = subject, Lpattern = "AATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     5 CATCG

Created on 2022-11-18 by the reprex package (v2.0.1)

jan-glx avatar Nov 18 '22 14:11 jan-glx

I think the issue is that Biostrings does not consider terminal gaps in alignments to be indels. See https://github.com/Bioconductor/pwalign/issues/2.

ATCG is retained if you specify with.Lindels = FALSE.

subject <- DNAStringSet(c("AAATCTCTCATCG"))
trimLRPatterns(subject = subject, Lpattern = "AAATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AAAATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     6 TCATCG

acvill avatar Jun 09 '23 14:06 acvill

Possibly. But while that issue might be considered odd behavior, this here is a bug, right? Though by I am even more confused by your last with.Lindels = FALSE example right now.

jan-glx avatar Jun 14 '23 13:06 jan-glx