BioAlignments.jl
BioAlignments.jl copied to clipboard
Add support for 'meta' alignment operations (`OP_PAD` and `OP_HARD_CLIP`)
Types of changes
This PR implements the following changes:
- [x] :sparkles: New feature (A non-breaking change which adds functionality).
- [x] :bug: Bug fix (A non-breaking change, which fixes an issue).
- [ ] :boom: Breaking change (fix or feature that would cause existing functionality to change).
:clipboard: Additional detail
Adds support for the OP_PAD
operation, and fixes the implementation of the OP_HARD_CLIP
operation. I have dubbed these two operations 'meta' operations, as they contain information about the origin and context of the alignment, but have no bearing on the content (i.e. reference or query) of the alignment.
New features
- Added function
ismetaop
, which returnstrue
if passedOP_PAD
orOP_HARD_CLIP
- Creating an alignment with a
OP_PAD
operation no longer throws an error - Added validity check for
OP_HARD_CLIP
orOP_SOFT_CLIP
in the wrong place According to the SAM specification, these have to be present at the end of an alignment, with soft clips optionally buffering hard clips from the rest of the alignment
Changed functionality
-
cigar(::Alignment)
now uses theAlignmentAnchor.alnpos
to print operation length -
OP_HARD_CLIP
is no longer considered an insert operation, andisinsertop(OP_HARD_CLIP)
now returnsfalse
- The help text for
OP_PAD
indicates it's supported now - Trying to create an alignment with a hard or soft clip in an improper place will throw an error (I know I said this above, but it's probably the most likely addition to break something, even though it does conform to spec)
Justification
BioAlignments should be able to handle all of the operations defined by the SAM specification. As it currently is, BioAlignments is unable even to parse the alignments in Section 1.1 "An example" of the specification. While the work on clips may seem out-of-scope for a change adding an operation support, the way clips work and pads work are very similar and needed to be considered together.
Example
Using query sequence "r002" from Section 1.1 of the SAM specification:
BioAlignments 2.0.0
julia> using BioAlignments
julia> Alignment("3S6M1P1I4M", 1, 9)
ERROR: The P CIGAR operation is not yet supported.
Stacktrace:
[1] Alignment(cigar::String, seqpos::Int64, refpos::Int64)
...
This PR
julia> using BioAlignments
julia> Alignment("3S6M1P1I4M", 1, 9)
Alignment:
aligned range:
seq: 0-14
ref: 8-18
CIGAR string: 3S6M1P1I4M
My testing also indicates that this fixes #56,
BioAlignments 2.0.0
julia> using BioAlignments, BioSequences
julia> aligned_sequence = AlignedSequence(dna"AGAAGTTTATCTGTGTGAACTTCTTGGCTTAGTATCGTTGAGAAGAATCGAGAGATTAGTGCAGTTTAAA",
Alignment("75H25M75H", 1, 1))
---------------------------------------------------------------------------·························---------------------------------------------------------------------------
AGAAGTTTATCTGTGTGAACTTCTTGGCTTAGTATCGTTGAGAAGAATCGAGAGATTAGTGCAGTTTAAAError showing value of type AlignedSequence{LongDNASeq}:
ERROR: BoundsError: attempt to access LongDNASeq at index [71]
Stacktrace:
...
This PR
julia> using BioAlignments, BioSequences
julia> aligned_sequence = AlignedSequence(dna"AGAAGTTTATCTGTGTGAACTTCTTGGCTTAGTATCGTTGAGAAGAATCGAGAGATTAGTGCAGTTTAAA",
Alignment("75H25M75H", 1, 1))
·························
AGAAGTTTATCTGTGTGAACTTCTT
:ballot_box_with_check: Checklist
- [x] :art: The changes implemented is consistent with the julia style guide.
- [x] :blue_book: I have updated and added relevant docstrings, in a manner consistent with the documentation styleguide.
- [x] :blue_book: I have added or updated relevant user and developer manuals/documentation in
docs/src/
. - [x] :ok: There are unit tests that cover the code changes I have made.
- [x] :ok: The unit tests cover my code changes AND they pass.
- [x] :pencil: I have added an entry to the
[UNRELEASED]
section of the manually curatedCHANGELOG.md
file for this repository. - [x] :ok: All changes should be compatible with the latest stable version of Julia.
- [x] :thought_balloon: I have commented liberally for any complex pieces of internal code.
@jakobnissen, @CiaranOMara, could I get a review from one of you?
Codecov Report
Base: 87.60% // Head: 88.20% // Increases project coverage by +0.59%
:tada:
Coverage data is based on head (
96f8ee7
) compared to base (b92ccf3
). Patch coverage: 100.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## master #64 +/- ##
==========================================
+ Coverage 87.60% 88.20% +0.59%
==========================================
Files 16 16
Lines 1138 1178 +40
==========================================
+ Hits 997 1039 +42
+ Misses 141 139 -2
Impacted Files | Coverage Δ | |
---|---|---|
src/alignment.jl | 84.66% <100.00%> (+4.19%) |
:arrow_up: |
src/operations.jl | 88.23% <100.00%> (+0.73%) |
:arrow_up: |
src/pairwise/alignment.jl | 96.99% <100.00%> (+0.41%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
LGTM. Thanks for putting in the work. W.r.t #44 - we should roll back the change, then merge #72, then we can apply this. This debacle all suggests to me that moving forward, we need to be more careful about what is API and what is internals, but let's have that change later.
I agree with @jakobnissen, we should postpone merging this until #44 and #72 are both in master
, rather than reverting 349e204.
Failed downstream is expected: this is a breaking change due to the inclusion of #44.