BioSequences.jl icon indicating copy to clipboard operation
BioSequences.jl copied to clipboard

Loosen search iterator

Open CiaranOMara opened this issue 3 years ago • 3 comments

This PR loosens the Search iterator to accommodate iterating in forward and reverse directions.

It needs a little more work to support the ApproxSearchQuery. It wasn't immediately obvious how to have the iterator generically include or pass through additional arguments to an underlying function. The other approach is to have ApproxSearchQuery own the additional argument for the number of allowed errors - comments would be helpful.

Let me know whether this seems like a good direction for a search iterator.

CiaranOMara avatar Jan 30 '22 04:01 CiaranOMara

Codecov Report

Merging #221 (a9d1355) into master (d3d8fc4) will decrease coverage by 0.18%. The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #221      +/-   ##
==========================================
- Coverage   88.57%   88.39%   -0.19%     
==========================================
  Files          31       31              
  Lines        2425     2430       +5     
==========================================
  Hits         2148     2148              
- Misses        277      282       +5     
Flag Coverage Δ
unittests 88.39% <50.00%> (-0.19%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/BioSequences.jl 72.72% <50.00%> (-21.40%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d3d8fc4...a9d1355. Read the comment docs.

codecov[bot] avatar Jan 30 '22 04:01 codecov[bot]

That seems very desirable to me, I remember thinking looking at the search code - "hmm how would you make Approx go backwards?".

TransGirlCodes avatar May 30 '22 12:05 TransGirlCodes

The Search iterator would become generic or complete in its current state if all the queries owned their additional arguments. However, changing ApproxSearchQuery is a breaking change. So a design discussion is needed (hence converting this PR to draft).

I think we would be in a better position if we constrain our findnext and findprev method signatures to the following.


function findnext(query, seq::BioSequence, start::Integer)
    ...
end

function findprev(query, seq::BioSequence, start::Integer)
    ...
end

All the peripheral findfirst, findlast, findall methods can become generic if we adopt this constraint.

In terms of the ApproxSearchQuery, it would need to adopt the additional k argument somehow, and perhaps another wrapper/layer may be required to optimise for the reuse of ApproxSearchQuery's work.

CiaranOMara avatar Jun 06 '22 00:06 CiaranOMara