BioAlignments.jl icon indicating copy to clipboard operation
BioAlignments.jl copied to clipboard

Add support for 'meta' alignment operations (`OP_PAD` and `OP_HARD_CLIP`)

Open MillironX opened this issue 2 years ago • 4 comments

Types of changes

This PR implements the following changes:

  • [x] :sparkles: New feature (A non-breaking change which adds functionality).
  • [x] :bug: Bug fix (A non-breaking change, which fixes an issue).
  • [ ] :boom: Breaking change (fix or feature that would cause existing functionality to change).

:clipboard: Additional detail

Adds support for the OP_PAD operation, and fixes the implementation of the OP_HARD_CLIP operation. I have dubbed these two operations 'meta' operations, as they contain information about the origin and context of the alignment, but have no bearing on the content (i.e. reference or query) of the alignment.

New features

  1. Added function ismetaop, which returns true if passed OP_PAD or OP_HARD_CLIP
  2. Creating an alignment with a OP_PAD operation no longer throws an error
  3. Added validity check for OP_HARD_CLIP or OP_SOFT_CLIP in the wrong place According to the SAM specification, these have to be present at the end of an alignment, with soft clips optionally buffering hard clips from the rest of the alignment

Changed functionality

  1. cigar(::Alignment) now uses the AlignmentAnchor.alnpos to print operation length
  2. OP_HARD_CLIP is no longer considered an insert operation, and isinsertop(OP_HARD_CLIP) now returns false
  3. The help text for OP_PAD indicates it's supported now
  4. Trying to create an alignment with a hard or soft clip in an improper place will throw an error (I know I said this above, but it's probably the most likely addition to break something, even though it does conform to spec)

Justification

BioAlignments should be able to handle all of the operations defined by the SAM specification. As it currently is, BioAlignments is unable even to parse the alignments in Section 1.1 "An example" of the specification. While the work on clips may seem out-of-scope for a change adding an operation support, the way clips work and pads work are very similar and needed to be considered together.

Example

Using query sequence "r002" from Section 1.1 of the SAM specification:

BioAlignments 2.0.0

julia> using BioAlignments

julia> Alignment("3S6M1P1I4M", 1, 9)
ERROR: The P CIGAR operation is not yet supported.
Stacktrace:
 [1] Alignment(cigar::String, seqpos::Int64, refpos::Int64)
...

This PR

julia> using BioAlignments

julia> Alignment("3S6M1P1I4M", 1, 9)
Alignment:
  aligned range:
    seq: 0-14
    ref: 8-18
  CIGAR string: 3S6M1P1I4M

My testing also indicates that this fixes #56,

BioAlignments 2.0.0

julia> using BioAlignments, BioSequences

julia> aligned_sequence = AlignedSequence(dna"AGAAGTTTATCTGTGTGAACTTCTTGGCTTAGTATCGTTGAGAAGAATCGAGAGATTAGTGCAGTTTAAA",
       Alignment("75H25M75H", 1, 1))
---------------------------------------------------------------------------·························---------------------------------------------------------------------------
AGAAGTTTATCTGTGTGAACTTCTTGGCTTAGTATCGTTGAGAAGAATCGAGAGATTAGTGCAGTTTAAAError showing value of type AlignedSequence{LongDNASeq}:
ERROR: BoundsError: attempt to access LongDNASeq at index [71]
Stacktrace:
...

This PR

julia> using BioAlignments, BioSequences

julia> aligned_sequence = AlignedSequence(dna"AGAAGTTTATCTGTGTGAACTTCTTGGCTTAGTATCGTTGAGAAGAATCGAGAGATTAGTGCAGTTTAAA",
              Alignment("75H25M75H", 1, 1))
·························
AGAAGTTTATCTGTGTGAACTTCTT

:ballot_box_with_check: Checklist

  • [x] :art: The changes implemented is consistent with the julia style guide.
  • [x] :blue_book: I have updated and added relevant docstrings, in a manner consistent with the documentation styleguide.
  • [x] :blue_book: I have added or updated relevant user and developer manuals/documentation in docs/src/.
  • [x] :ok: There are unit tests that cover the code changes I have made.
  • [x] :ok: The unit tests cover my code changes AND they pass.
  • [x] :pencil: I have added an entry to the [UNRELEASED] section of the manually curated CHANGELOG.md file for this repository.
  • [x] :ok: All changes should be compatible with the latest stable version of Julia.
  • [x] :thought_balloon: I have commented liberally for any complex pieces of internal code.

MillironX avatar Jan 27 '22 15:01 MillironX

@jakobnissen, @CiaranOMara, could I get a review from one of you?

MillironX avatar May 05 '22 16:05 MillironX

Codecov Report

Base: 87.60% // Head: 88.20% // Increases project coverage by +0.59% :tada:

Coverage data is based on head (96f8ee7) compared to base (b92ccf3). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #64      +/-   ##
==========================================
+ Coverage   87.60%   88.20%   +0.59%     
==========================================
  Files          16       16              
  Lines        1138     1178      +40     
==========================================
+ Hits          997     1039      +42     
+ Misses        141      139       -2     
Impacted Files Coverage Δ
src/alignment.jl 84.66% <100.00%> (+4.19%) :arrow_up:
src/operations.jl 88.23% <100.00%> (+0.73%) :arrow_up:
src/pairwise/alignment.jl 96.99% <100.00%> (+0.41%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar May 05 '22 18:05 codecov[bot]

LGTM. Thanks for putting in the work. W.r.t #44 - we should roll back the change, then merge #72, then we can apply this. This debacle all suggests to me that moving forward, we need to be more careful about what is API and what is internals, but let's have that change later.

jakobnissen avatar May 06 '22 10:05 jakobnissen

I agree with @jakobnissen, we should postpone merging this until #44 and #72 are both in master, rather than reverting 349e204.

MillironX avatar May 09 '22 13:05 MillironX

Failed downstream is expected: this is a breaking change due to the inclusion of #44.

MillironX avatar Oct 01 '22 19:10 MillironX