Biostrings
Biostrings copied to clipboard
Give `matchProbePair` the `fixed` parameter to allow IUPAC matching?
I recently need to predict amplicons using primers with lots of ambiguity codes. The matchProbePair
function seemed perfect but it does not interpret the IUPAC codes. I noticed the code uses matchPattern
to find primer matches and that function has the fixed
option that allows interpretation of ambiguity codes when fixed=FALSE
.
Can the matchProbePair
function also have this option and pass its value to matchPattern
like so?
matchProbePair <- function(Fprobe, Rprobe, subject, algorithm="auto",
logfile=NULL, verbose=FALSE, fixed = FALSE)
{
## This won't copy the data if Fprobe and Rprobe are already DNAString objects
F <- DNAString(Fprobe)
R <- DNAString(Rprobe)
## F and R hits on the + strand
Fp_hits <- start(matchPattern(F, subject, algorithm=algorithm, fixed = fixed))
Rp_hits <- start(matchPattern(R, subject, algorithm=algorithm, fixed = fixed))
## F and R hits on the - strand
Fm_hits <- end(matchPattern(reverseComplement(F), subject, algorithm=algorithm, fixed = fixed))
Rm_hits <- end(matchPattern(reverseComplement(R), subject, algorithm=algorithm, fixed = fixed))
if (verbose) {
cat("Fp_hits:", Fp_hits, " Rp_hits:", Rp_hits,
" Fm_hits:", Fm_hits, " Rm_hits:", Rm_hits, "\n")
}
matches0 <- Biostrings:::reduceProbePairMatches(c(Fp_hits, Rp_hits), c(Fm_hits, Rm_hits))
ans <- Views(subject, start=matches0$start, end=matches0$end)
if (!is.null(logfile)) {
nFp <- length(Fp_hits)
nRp <- length(Rp_hits)
nFm <- length(Fm_hits)
nRm <- length(Rm_hits)
nmatches0 <- length(ans)
## cat("", ..., sep="\t") is a trick to get an extra tab
cat("", nFp, nRp, nFm, nRm, nmatches0, file=logfile, sep="\t")
}
ans
}
I used the code above to do my analysis and it seemed to work.
I can submit a PR if you are interested.
Thanks!
Pull requests are always encouraged. Please prepare and we will look it over more closely. Thank you.
Hi @zachary-foster @lshep
has there been any update on this issue?
@zachary-foster Is your function above working as intended?
Thank you for your help.
Thanks @zachary-foster for code. This was very helpful.
@johanneswerner Sorry, I did not notice your question originally. The function above has been working as far as I can tell. If I remember correctly, I tried to get the package built on my computer to set up a pull request, but could not get the package to build for some reason. Since I could not run the tests to ensure that the changes would not cause problems, I gave up for the time and have not gotten back to it.
@louellette No problem!
@zachary-foster Do you still remember what you have done? Maybe we can make this pull request set up together and add this functionality.
I looked at the repo still on my computer and I basically just made the options max.mismatch
, min.mismatch
, with.indels
, fixed
from matchPattern
available to the matchProbePair
function call. Attached is the version of the matchProbePair
source with my changes. Its the only file I changed.
Thank you @zachary-foster
devtools::check()
currently fails on the current state of the package due to unresolved dependencies to BiocGenerics
, those should hopefully be resolved when R 4.1 is released and I can switch to the Bioconductor devel channel. I will then try to include your changes and try to rebuild the package. I think the only thing missing would be additional tests ... I hope. :-) Thank you for sharing your code.
Sounds good, thanks!
Just merged in this functionality. matchProbePair
now allows a ...
argument that passes any further arguments to matchPattern
, allowing for the functionality you're describing without manually listing out all the arguments.