Missings.jl icon indicating copy to clipboard operation
Missings.jl copied to clipboard

Use broadcasting instead of `convert()` for `disallowmissing()`

Open kescobo opened this issue 4 years ago • 3 comments

From discussion on Slack with @pdeffebach, broadcasting seems to be a bit faster

using Missings, BenchmarkTools

v = Vector{Union{Float64, Missing}}(rand(10000));

@benchmark disallowmissing($v)

@benchmark Float64.($v)

image

kescobo avatar Jul 21 '21 16:07 kescobo

Hmm - actually, it must be whatever else disallowmissing is doing, since

@benchmark convert.(Float64, $v)

is about the same:

image

kescobo avatar Jul 21 '21 16:07 kescobo

The reason Missings.jl uses convert(AbstractArray{T}, ...) is because of categorical arrays and other non-Vector{T} array types.

But for Vector{T} we can define a more specialized method that is faster.

pdeffebach avatar Jul 22 '21 13:07 pdeffebach

But for Vector{T} we can define a more specialized method that is faster.

Note that the benchmark above uses convert.(Float64, $v), not convert(Vector{Float64}, $v). The latter seems to be faster than convert(AbstractVector{Float64}, $v), but still slower than broadcasting.

I think these performance differences should be reported against Julia. disallowmissing isn't the problem here, it's definition is trivial.

(disallowmissing cannot use broadcasting as the element type should be preserved disregarding the actual contents of the vector.)

nalimilan avatar Jul 23 '21 15:07 nalimilan