Inference failure on `mean` with a subarray
I am working on a PR to Missings.jl where I construct a subarray manually. See the PR here
I've been trying to debug inference issues using the mean function. I thought that my implementation was wrong. But now I see that mean is actually type-unstable in this instance!
Is there anything that I should do to recover type stability? Is that I'm doing kosher at all?
julia> x = [1, 2, missing]
3-element Vector{Union{Missing, Int64}}:
1
2
missing
julia> function nomissing_subarray(a::AbstractVector, nonmissinginds::AbstractVector)
T = nonmissingtype(eltype(a)) # Element type
N = 1 # Dimension of view
P = typeof(a) # Type of parent array
I = Tuple{typeof(nonmissinginds)} # Type of the non-missing indices
L = Base.IndexStyle(a) === IndexLinear # If the type supports fast linear indexing
SubArray{T, N, P, I, L}(a, (nonmissinginds,), 0, 1)
end;
julia> t = nomissing_subarray(x, [1, 2])
2-element view(::Vector{Union{Missing, Int64}}, [1, 2]) with eltype Int64:
1
2
julia> using Statistics
julia> @code_warntype mean(t)
MethodInstance for Statistics.mean(::SubArray{Int64, 1, Vector{Union{Missing, Int64}}, Tuple{Vector{Int64}}, false})
from mean(A::AbstractArray; dims) @ Statistics ~/.julia/juliaup/julia-1.9.0+0.x64.linux.gnu/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:164
Arguments
#self#::Core.Const(Statistics.mean)
A::SubArray{Int64, 1, Vector{Union{Missing, Int64}}, Tuple{Vector{Int64}}, false}
Body::Any
1 ─ %1 = Statistics.:(var"#mean#2")(Statistics.:(:), #self#, A)::Any
└── return %1
julia> @which mean(t)
mean(A::AbstractArray; dims)
@ Statistics ~/.julia/juliaup/julia-1.9.0+0.x64.linux.gnu/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:164
Update, the inference failure happens because x is a vector which may contain missing. A simpler MWE:
julia> x = Union{Int, Missing}[1, 2, 3]
3-element Vector{Union{Missing, Int64}}:
1
2
3
julia> @code_warntype mean(x)
MethodInstance for Statistics.mean(::Vector{Union{Missing, Int64}})
from mean(A::AbstractArray; dims) @ Statistics ~/.julia/juliaup/julia-1.9.0+0.x64.linux.gnu/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:164
Arguments
#self#::Core.Const(Statistics.mean)
A::Vector{Union{Missing, Int64}}
Body::Any
1 ─ %1 = Statistics.:(var"#mean#2")(Statistics.:(:), #self#, A)::Any
└── return %1
I'm bummed
- that the example above doesn't give
Union{Float64, Missing}. That would help inference a lot - That my
SubArraytrick doesn't work. I thought telling the compiler theeltypeof theSubArraywasIntwould be enough for it realize there is no type instability. But that is not the case.
The SubArray issue seems to be due to the fact that nothing ensures that a value of type T is returned: getindex just calls getindex on the parent array, without a type assertion. I wonder whether it would be acceptable to add ::T in Base, but I'm afraid it would have a performance impact in the most common case where the eltype is the same as the parent.
mean being inferred as Any more generally is annoying and probably related to the promotion mechanism that we now use. This needs investigation to see whether we can find a good solution.
This also affects issue also affects the function var and std.
julia> @code_warntype std([1.,2.,missing])
MethodInstance for Statistics.std(::Vector{Union{Missing, Float64}})
from std(A::AbstractArray; corrected, mean, dims) @ Statistics /mnt/data1/abarth/opt/julia-1.9.0/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:449
Arguments
#self#::Core.Const(Statistics.std)
A::Vector{Union{Missing, Float64}}
Body::Any
1 ─ %1 = Statistics.:(var"#std#17")(true, Statistics.nothing, Statistics.:(:), #self#, A)::Any
└── return %1