Statistics.jl
Statistics.jl copied to clipboard
median(x, dims=n) is type unstable
In both Julia 1.3.1 and Julia 1.4.1 the median function is type unstable if called with the dims keyword argument.
For example
julia> @code_warntype median(rand(10), dims=1)
Variables
#unused#::Core.Compiler.Const(Statistics.var"#median##kw"(), false)
@_2::NamedTuple{(:dims,),Tuple{Int64}}
@_3::Core.Compiler.Const(Statistics.median, false)
v::Array{Float64,1}
dims::Int64
@_6::Int64
Body::Any
1 ─ %1 = Base.haskey(@_2, :dims)::Core.Compiler.Const(true, false)
│ %1
│ (@_6 = Base.getindex(@_2, :dims))
└── goto #3
2 ─ Core.Compiler.Const(:(@_6 = Statistics.:(:)), false)
3 ┄ (dims = @_6)
│ %7 = (:dims,)::Core.Compiler.Const((:dims,), false)
│ %8 = Core.apply_type(Core.NamedTuple, %7)::Core.Compiler.Const(NamedTuple{(:dims,),T} where T<:Tuple, false)
│ %9 = Base.structdiff(@_2, %8)::Core.Compiler.Const(NamedTuple(), false)
│ %10 = Base.pairs(%9)::Core.Compiler.Const(Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}(), false)
│ %11 = Base.isempty(%10)::Core.Compiler.Const(true, false)
│ %11
└── goto #5
4 ─ Core.Compiler.Const(:(Base.kwerr(@_2, @_3, v)), false)
5 ┄ %15 = Statistics.:(var"#median#47")(dims, @_3, v)::Any
└── return %15
I did a little digging and realized that the type instability comes from mapslices called by the dims version of the median function. Thus, I'll move this to base.
Possibly this should avoid mapslices, something like this?
Statistics._median(v::AbstractVector, dims::Integer) = dims==1 ? [median!(collect(v))] : collect(v)
function Statistics._median(A::AbstractArray{T,N}, dims) where {T,N}
iter = Iterators.product(ntuple(d -> (d in dims) ? axes(A,d) : (:,), Val(N))...)
[median!(A[i...]) for i in iter]::Array{T,N}
end
@code_warntype median(rand(10), dims=1)
@code_warntype median(rand(10,10), dims=1)
This is quicker for very small arrays, but doesn't matter for large ones:
julia> @btime median(r, dims=1) setup=(r=rand(10,10)); # Julia 1.4
10.725 μs (119 allocations: 4.31 KiB)
julia> @btime median(r, dims=1) setup=(r=rand(10,10)); # with this change
1.475 μs (15 allocations: 1.80 KiB)
Even if performance is similar, type stability matters as it can improve performance down the line.
Agreed. But should be the fix for this to make mapslices type stable, rather than building another custom version?
Yes but we could maybe find a simpler fix for median. mapslices may not get a lot of attention as it's inefficient anyway since it allocates copies of slices (and may be deprecated in Julia 2.0).
median only really needs map(... eachslice), it can skip the other half of mapslices which glues things back together again. But eachslice is limited to one iteration dimension right now.
map(..., eachslice) would change the shape of the output which I guess would be breaking change.
Another quick fix would be to at least type-assert that the output of median is the same as the input type for Array; e.g. median(A::Array{T, N}; dims) where {T, N} will always have return type Array{T, N}).
This would of course not work for all AbstractArray types, e.g. if the shape is part of the type signature as with StaticArrays.
While this does not fix the type stability issue, it at least prevents propagation down the line.
Agree the output shouldn't change, but you could easily reshape afterwards. But still not viable until eachslice takes multiple dims.
This appears to be fixed since Julia v1.9:
▶ julia +1.9
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.9.4 (2023-11-14)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using Statistics
julia> @code_warntype median(rand(10), dims=1)
MethodInstance for Core.kwcall(::NamedTuple{(:dims,), Tuple{Int64}}, ::typeof(median), ::Vector{Float64})
from kwcall(::Any, ::typeof(median), v::AbstractArray) @ Statistics ~/.julia/juliaup/julia-1.9.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:864
Arguments
_::Core.Const(Core.kwcall)
@_2::NamedTuple{(:dims,), Tuple{Int64}}
@_3::Core.Const(Statistics.median)
v::Vector{Float64}
Locals
dims::Union{}
@_6::Int64
Body::Vector{Float64}