Type instability when reading a BAM file
I don't know if this is a significant bottleneck, but I noticed some type instability when looping over BAM file records:
function read_bam(bam_file)
# iterate over BAM records
chr = String[]
left_pos = Int64[]
right_pos = Int64[]
for record in reader
push!(chr, BAM.refname(record))
push!(left_pos, leftposition(record))
push!(right_pos, rightposition(record))
end
DataFrame(chr = chr, left = left_pos, right = right_pos)
end
@code_warntype read_bam(some_file)
!julia> @code_warntype read_bam(bam_file)
Variables:
#self#
bam_file
chr::Array{String,1}
left_pos::Array{Int64,1}
right_pos::Array{Int64,1}
itemT
Body:
begin
chr::Array{String,1} = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{String,1}, svec(Any, Int64), Array{String,1}, 0, 0, 0)) # line 4:
left_pos::Array{Int64,1} = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), Array{Int64,1}, 0, 0, 0)) # line 5:
right_pos::Array{Int64,1} = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Int64,1}, svec(Any, Int64), Array{Int64,1}, 0, 0, 0)) # line 6:
SSAValue(0) = Main.reader
#temp#::ANY = (Base.start)(SSAValue(0))::ANY
9:
unless !((Base.done)(SSAValue(0), #temp#::ANY)::ANY)::ANY goto 39
SSAValue(1) = (Base.next)(SSAValue(0), #temp#::ANY)::ANY
record::ANY = (Core.getfield)(SSAValue(1), 1)::ANY
#temp#::ANY = (Core.getfield)(SSAValue(1), 2)::ANY # line 7:
SSAValue(5) = ($(QuoteNode(BioAlignments.BAM.refname)))(record::ANY)::String
$(Expr(:inbounds, false))
# meta: location array.jl push! 652
SSAValue(6) = (Base.bitcast)(UInt64, (Base.check_top_bit)(1)::Int64)
$(Expr(:foreigncall, :(:jl_array_grow_end), Void, svec(Any, UInt64), :(chr), 0, SSAValue(6), 0)) # line 653:
# meta: location abstractarray.jl endof 134
# meta: location abstractarray.jl linearindices 99
# meta: location abstractarray.jl indices1 71
# meta: location abstractarray.jl indices 64
SSAValue(9) = (Base.arraysize)(chr::Array{String,1}, 1)::Int64
# meta: pop location
# meta: pop location
# meta: pop location
# meta: pop location
(Base.arrayset)(chr::Array{String,1}, SSAValue(5), (Base.select_value)((Base.slt_int)(SSAValue(9), 0)::Bool, 0, SSAValue(9))::Int64)::Array{String,1}
# meta: pop location
$(Expr(:inbounds, :pop)) # line 8:
(Main.push!)(left_pos::Array{Int64,1}, (Main.leftposition)(record::ANY)::ANY)::Array{Int64,1} # line 9:
(Main.push!)(right_pos::Array{Int64,1}, (Main.rightposition)(record::ANY)::ANY)::Array{Int64,1}
37:
goto 9
39: # line 11:
return $(Expr(:invoke, MethodInstance for (::Core.#kw#Type)(::Array{Any,1}, ::Type{DataFrames.DataFrame}), :($(QuoteNode(Type))), :($(Expr(:invoke, MethodInstance for vector_any(::Any, ::Vararg{Any,N} where N), :(Base.vector_any)$
end::DataFrames.DataFrame
I think the ANY type for record and temp indicate some type instability
Thats very interesting that record should be of type ANY. At first glance, I think that should not be happening. Thanks for flagging this!
Perhaps your iterator has the same trouble as those mentioned here: https://github.com/JuliaLang/julia/pull/26852
?
Pete
Peter M. Haverty, Ph.D. Genentech, Inc. [email protected]
On Wed, Apr 18, 2018 at 2:37 PM, Ben J. Ward [email protected] wrote:
Thats very interesting that record should be of type ANY. At first glance, I think that should not be happening. Thanks for flagging this!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BioJulia/BioAlignments.jl/issues/10#issuecomment-382537541, or mute the thread https://github.com/notifications/unsubscribe-auth/AH02K9Rt27dTdXcvOW_uHdyZVZMMI4i0ks5tp7IVgaJpZM4TaWLV .
When following the iteration suggestion in the "Performance Tips" section, no type instability is observed. e.g. for:
reader = open(BAM.Reader, "data.bam") record = BAM.Record() while !eof(reader) read!(reader, record) # do something end
Variables:
#self#
genome_name::String
reader::BioAlignments.BAM.Reader{IOStream}
info::GenomicVectors.GenomeInfo{Int64}
chr::Array{String,1}
left_pos::Array{Int64,1}
right_pos::Array{Int64,1}
record::BioAlignments.BAM.Record
itemT@_9
itemT@_10
Body:
Maybe just update the docs to make this the standard way of looping over a BAM?