XAM.jl icon indicating copy to clipboard operation
XAM.jl copied to clipboard

Error when opening a BAM file generated by HISAT2 version 2.2.0

Open yuifu opened this issue 3 years ago • 3 comments

I have used XAM.jl to treat BAM files. Today I came across an error when opening a BAM file generated by HISAT2 version 2.2.0. No error occurred for a BAM file generated by HISAT2 version 2.1.0 (from the same FASTQ file)

Expected Behavior

No errors are expected.

julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")

Current Behavior

julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")
ERROR: ArgumentError: malformed metainfo at line 68
Stacktrace:
 [1] readheader!(::TranscodingStreams.TranscodingStream{TranscodingStreams.Noop,Base.GenericIOBuffer{Array{UInt8,1}}}, ::XAM.SAM.Header, ::Tuple{Int64,Int64}) at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/readrecord.jl:318
 [2] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/reader.jl:13 [inlined]
 [3] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/sam/reader.jl:38 [inlined]
 [4] init_bam_reader(::BGZFStreams.BGZFStream{IOStream}) at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:96
 [5] init_bam_reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:125 [inlined]
 [6] #Reader#2 at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:36 [inlined]
 [7] Reader at /Users/ozakiharuka/.julia/packages/XAM/ahh4D/src/bam/reader.jl:31 [inlined]
 [8] #open#1 at /Users/ozakiharuka/.julia/packages/BioGenerics/cCuGr/src/IO.jl:42 [inlined]
 [9] open(::Type{XAM.BAM.Reader}, ::String) at /Users/ozakiharuka/.julia/packages/BioGenerics/cCuGr/src/IO.jl:42
 [10] top-level scope at REPL[2]:1
(v1.3) pkg> status
....
  [d759349c] XAM v0.2.3
....
julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Possible Solution / Implementation

I have no idea but the release note of HISAT2 might help.

HISAT 2.2.0 release 2/6/2020 This major version update includes a new feature to handle “repeat” reads. Based on sets of 100-bp simulated and 101-bp real reads that we tested, we found that 2.6-3.4% and 1.4-1.8% of the reads were mapped to >5 locations and >100 locations, respectively. Attempting to report all alignments would likely consume a prohibitive amount of disk space. In order to address this issue, our repeat indexing and alignment approach directly aligns reads to repeat sequences, resulting in one repeat alignment per read. HISAT2 provides application programming interfaces (API) for C++, Python, and JAVA that rapidly retrieve genomic locations from repeat alignments for use in downstream analyses. Other minor bug fixes are also included as follows:

Fixed occasional sign (+ or -) issues of template lengths in SAM file Fixed duplicate read alignments in SAM file Skip a splice site if exon’s last base or first base is ambiguous (N)

Steps to Reproduce (for bugs)

  1. Download the BAM file from here.
  2. Run the following codes:
julia> using XAM
julia> reader = open(BAM.Reader, "SRR8452726_1.sort.bam")

Context

Your Environment

  • Julia Version 1.3.1
  • XAM v0.2.3
  • OS: macOS (x86_64-apple-darwin18.6.0)
  • CPU: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
  • WORD_SIZE: 64
  • LIBM: libopenlibm
  • LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
(v1.3) pkg> status
    Status `~/.julia/environments/v1.3/Project.toml`
  [c7e460c6] ArgParse v1.1.0
  [8e4a8c10] BED v0.1.0
  [00701ae9] BioAlignments v2.0.0
  [7e6ae17a] BioSequences v2.0.5
  [336ed68f] CSV v0.6.1
  [9961bab8] Cbc v0.6.7
  [aaaa29a8] Clustering v0.14.0
  [8f4d0f93] Conda v1.4.1
  [a93c6f00] DataFrames v0.20.2
  [968ba79b] DocOpt v0.4.0
  [8f5d6c58] EzXML v1.1.0
  [c2308a5c] FASTX v1.1.2
  [652a1917] Fire v0.1.0
  [587475ba] Flux v0.11.0
  [899a7d2d] GenomicFeatures v2.0.0
  [c27321d9] Glob v1.3.0
  [a2cc645c] GraphPlot v0.4.2
  [cd3eb016] HTTP v0.8.16
  [7073ff75] IJulia v1.21.2
  [682c06a0] JSON v0.21.0
  [4076af6c] JuMP v0.21.2
  [093fc24a] LightGraphs v1.3.1
  [9b87118b] PackageCompiler v1.1.1
  [91a5bcdd] Plots v1.0.10
  [d330b81b] PyPlot v2.9.0
  [777a009e] ReadCoverage v0.1.1 #master (https://github.com/bioinfo-tsukuba/ReadCoverage.jl)
  [3cdcf5f2] RecipesBase v1.0.0
  [01d81517] RecipesPipeline v0.1.3
  [2913bbd2] StatsBase v0.33.0
  [70df011a] TableReader v0.4.0
  [9c690861] TensorToolbox v1.0.1
  [d759349c] XAM v0.2.3
  [ddb6d928] YAML v0.4.0
  [8bb1440f] DelimitedFiles
  [de0858da] Printf

yuifu avatar Jul 17 '20 10:07 yuifu