Biostrings icon indicating copy to clipboard operation
Biostrings copied to clipboard

matchPattern wildcard for AAstringset

Open swuyts opened this issue 5 years ago • 3 comments

Hi there,

The function matchPattern allows the use of wildcards for DNA sequences;

testnucl <- DNAString("AAAACTAAA")
matchPattern("CNA", testnucl, fixed = F)

Returns:

Views on a 9-letter DNAString subject
subject: AAAACTAAA
views:
    start end width
[1]     5   7     3 [CTA]

While the same strategy for amino acids:

testprot <- AAString("MMMMPSTMMM")
matchPattern("PXT", testprot, fixed = T)

Returns:

Error in normargFixed(fixed, subject) : 
  'fixed' value only supported for a DNA or RNA subject (you can only use 'fixed=TRUE' with your subject)

Is there a reason for this wildcard search not being allowed on amino acid level?

Many thanks in advance, Sander

swuyts avatar Jan 22 '20 09:01 swuyts

Sorry for the delay.

Is there a reason for this wildcard search not being allowed on amino acid level?

No particular reason. It's just that it's not implemented at the moment. New functionalities get implemented based on user needs and you're the first person to express that need. Sounds like it would be a nice improvement. Unfortunately new developments in the Biostrings package have been put on the back burner for a few years now because of other priorities in Bioconductor and I don't think I'll be able to do any work on this in the near future. Want to send a PR?

Thanks, H.

hpages avatar Apr 17 '20 20:04 hpages

No worries!

I'll have a look at your current implementation and see what I can do. No promises :)

swuyts avatar Apr 27 '20 11:04 swuyts

This is fixed in https://github.com/ahl27/Biostrings/tree/wildcardAA, will be put up for PR after #97 is merged

ahl27 avatar Mar 30 '23 17:03 ahl27