Statistics.jl icon indicating copy to clipboard operation
Statistics.jl copied to clipboard

median(x, dims=n) is type unstable

Open lassepe opened this issue 5 years ago • 8 comments

In both Julia 1.3.1 and Julia 1.4.1 the median function is type unstable if called with the dims keyword argument.

For example

julia> @code_warntype median(rand(10), dims=1)
Variables
  #unused#::Core.Compiler.Const(Statistics.var"#median##kw"(), false)
  @_2::NamedTuple{(:dims,),Tuple{Int64}}
  @_3::Core.Compiler.Const(Statistics.median, false)
  v::Array{Float64,1}
  dims::Int64
  @_6::Int64

Body::Any
1 ─ %1  = Base.haskey(@_2, :dims)::Core.Compiler.Const(true, false)
│         %1
│         (@_6 = Base.getindex(@_2, :dims))
└──       goto #3
2 ─       Core.Compiler.Const(:(@_6 = Statistics.:(:)), false)
3 ┄       (dims = @_6)
│   %7  = (:dims,)::Core.Compiler.Const((:dims,), false)
│   %8  = Core.apply_type(Core.NamedTuple, %7)::Core.Compiler.Const(NamedTuple{(:dims,),T} where T<:Tuple, false)
│   %9  = Base.structdiff(@_2, %8)::Core.Compiler.Const(NamedTuple(), false)
│   %10 = Base.pairs(%9)::Core.Compiler.Const(Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}(), false)
│   %11 = Base.isempty(%10)::Core.Compiler.Const(true, false)
│         %11
└──       goto #5
4 ─       Core.Compiler.Const(:(Base.kwerr(@_2, @_3, v)), false)
5 ┄ %15 = Statistics.:(var"#median#47")(dims, @_3, v)::Any
└──       return %15

lassepe avatar May 09 '20 18:05 lassepe

I did a little digging and realized that the type instability comes from mapslices called by the dims version of the median function. Thus, I'll move this to base.

lassepe avatar May 25 '20 07:05 lassepe

Possibly this should avoid mapslices, something like this?

Statistics._median(v::AbstractVector, dims::Integer) = dims==1 ? [median!(collect(v))] : collect(v)

function Statistics._median(A::AbstractArray{T,N}, dims) where {T,N}
    iter = Iterators.product(ntuple(d -> (d in dims) ? axes(A,d) : (:,), Val(N))...)
    [median!(A[i...]) for i in iter]::Array{T,N}
end

@code_warntype median(rand(10), dims=1)
@code_warntype median(rand(10,10), dims=1)

This is quicker for very small arrays, but doesn't matter for large ones:

julia> @btime median(r, dims=1) setup=(r=rand(10,10)); # Julia 1.4
  10.725 μs (119 allocations: 4.31 KiB)

julia> @btime median(r, dims=1) setup=(r=rand(10,10)); # with this change
  1.475 μs (15 allocations: 1.80 KiB)

mcabbott avatar May 25 '20 13:05 mcabbott

Even if performance is similar, type stability matters as it can improve performance down the line.

nalimilan avatar Jun 07 '20 14:06 nalimilan

Agreed. But should be the fix for this to make mapslices type stable, rather than building another custom version?

lassepe avatar Jun 07 '20 14:06 lassepe

Yes but we could maybe find a simpler fix for median. mapslices may not get a lot of attention as it's inefficient anyway since it allocates copies of slices (and may be deprecated in Julia 2.0).

nalimilan avatar Jun 07 '20 15:06 nalimilan

median only really needs map(... eachslice), it can skip the other half of mapslices which glues things back together again. But eachslice is limited to one iteration dimension right now.

mcabbott avatar Jun 07 '20 18:06 mcabbott

map(..., eachslice) would change the shape of the output which I guess would be breaking change.

Another quick fix would be to at least type-assert that the output of median is the same as the input type for Array; e.g. median(A::Array{T, N}; dims) where {T, N} will always have return type Array{T, N}).

This would of course not work for all AbstractArray types, e.g. if the shape is part of the type signature as with StaticArrays. While this does not fix the type stability issue, it at least prevents propagation down the line.

lassepe avatar Jun 08 '20 12:06 lassepe

Agree the output shouldn't change, but you could easily reshape afterwards. But still not viable until eachslice takes multiple dims.

mcabbott avatar Jun 08 '20 16:06 mcabbott

This appears to be fixed since Julia v1.9:

▶ julia +1.9
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.4 (2023-11-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Statistics

julia> @code_warntype median(rand(10), dims=1)
MethodInstance for Core.kwcall(::NamedTuple{(:dims,), Tuple{Int64}}, ::typeof(median), ::Vector{Float64})
  from kwcall(::Any, ::typeof(median), v::AbstractArray) @ Statistics ~/.julia/juliaup/julia-1.9.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:864
Arguments
  _::Core.Const(Core.kwcall)
  @_2::NamedTuple{(:dims,), Tuple{Int64}}
  @_3::Core.Const(Statistics.median)
  v::Vector{Float64}
Locals
  dims::Union{}
  @_6::Int64
Body::Vector{Float64}

mbauman avatar Aug 27 '24 21:08 mbauman