Missing values support
I tested out this package on 0.7 with missings and got the following error:
I think this is because Statistics is not imported by NaNMath, so the function calls to NaNMath.mean can't fall base on Statistics.mean.
I think the solution here is to add import Statistics and add fall backs for operations of type Any so those are called when there are missing values. Does that seem feasible?
nm.mean([1.,2., missing])
> ERROR: WARNING: Base.mean is deprecated: it has been moved to the standard library package `Statistics`.
Add `using Statistics` to your imports.
in module Base
MethodError: no method matching mean(::Array{Union{Missing, Float64},1})
You may have intended to import Base.mean
Closest candidates are:
mean(::AbstractArray{T<:AbstractFloat,N} where N) where T<:AbstractFloat at /Users/peterdeffebach/.julia/packages/NaNMath/pEda/src/NaNMath.jl:167
Stacktrace:
[1] top-level scope at none:0
I encountered a similar issue when using this package with missing values in 0.6.4:
julia> versioninfo()
Julia Version 0.6.4
Commit 9d11f62bcb (2018-07-09 19:09 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=16)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> using NaNMath
julia> NaNMath.log(-100)
NaN
julia> NaNMath.log(missing)
ERROR: MethodError: no method matching log(::Missings.Missing)
You may have intended to import Base.log
Closest candidates are:
log(::Float32) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:12
log(::Float64) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:11
log(::Real) at /Users/martink/.julia/v0.6/NaNMath/src/NaNMath.jl:13
...
julia> Base.log(missing)
missing
Would it make sense to define functions in this packages to return missing for missing input values? This would make sense to me, as missing and NaN here represent two different scenarios: NaN stands for invalid input (e.g., negative values for log), whereas missing stands for entirely absent input.
log here could be defined for Union{Missing, Float64} input types and have a simple ismissing check within its definition. I'm happy to work on this if this is a desired feature of the package.
I see how this is useful, but handling missing extends beyond the problem that this package was designed to address and that I have interest in maintaining. Maybe MissingNaNMath is in order? :)
Fair enough!
But - given that missing is part of Base Julia, is there any use for a numerical calculation package that does not support it? That sounds quite dangerous.
is there any use for a numerical calculation package that does not support it? That sounds quite dangerous.
This comment is a bit hyperbolic. The package is already being used by upstream AD packages like ForwardDiff and JuMP (https://github.com/JuliaOpt/JuMP.jl/blob/82e998f72e849f7a5f8c264accb41d9eafbcc845/src/Derivatives/Derivatives.jl#L15) for well-defined purposes that currently don't require any support for missing.
I see you're right this package is still equally useful in that context. I guess I'm just disappointed as I thought this could be a more broadly useful package, and most packages that would depend on this will need to be able to support all the numerical functionality defined in Base for these functions.
My suggestion that it might be "dangerous" came from the notion that users would expect Base.mean and NaNMath.mean to behave similarly except in their treatment of NaN.
One solution that is missing-agnostic is to have fallbacks for non-float types that revert to the functions defined in Base. But you might want to throw errors when non-floats are used given the purpose of NaNMath.
I view a MethodError as pretty low on the dangerousness scale. Silently not handling NaNs as promised is more dangerous, e.g., https://github.com/mlubin/NaNMath.jl/pull/21#issuecomment-305064958.
Anyway I'm willing to share maintainership if there's interest in making the package more useful for additional applications.
You're right. After looking into it I'm also less certain that the right approach is to change this to treat missing like NaN. Sorry for the noise.
It would make sense to accept ~arrays~ iterables of any element type though, and only throw an error when actually encountering a value of an unsupported type. That's what Statistics.mean does, and it's useful if you want to pass skipmissing(x) to NaNMath.mean.