Biostrings
Biostrings copied to clipboard
matchPattern wildcard for AAstringset
Hi there,
The function matchPattern
allows the use of wildcards for DNA sequences;
testnucl <- DNAString("AAAACTAAA")
matchPattern("CNA", testnucl, fixed = F)
Returns:
Views on a 9-letter DNAString subject
subject: AAAACTAAA
views:
start end width
[1] 5 7 3 [CTA]
While the same strategy for amino acids:
testprot <- AAString("MMMMPSTMMM")
matchPattern("PXT", testprot, fixed = T)
Returns:
Error in normargFixed(fixed, subject) :
'fixed' value only supported for a DNA or RNA subject (you can only use 'fixed=TRUE' with your subject)
Is there a reason for this wildcard search not being allowed on amino acid level?
Many thanks in advance, Sander
Sorry for the delay.
Is there a reason for this wildcard search not being allowed on amino acid level?
No particular reason. It's just that it's not implemented at the moment. New functionalities get implemented based on user needs and you're the first person to express that need. Sounds like it would be a nice improvement. Unfortunately new developments in the Biostrings package have been put on the back burner for a few years now because of other priorities in Bioconductor and I don't think I'll be able to do any work on this in the near future. Want to send a PR?
Thanks, H.
No worries!
I'll have a look at your current implementation and see what I can do. No promises :)
This is fixed in https://github.com/ahl27/Biostrings/tree/wildcardAA, will be put up for PR after #97 is merged