OnlineStats.jl
OnlineStats.jl copied to clipboard
Type consistency
I have found this repo recently, and as I am integrating it into my code, I noticed that a lot of type information is lost.
I.e.
> eltype(Mean)
Any
which is surprising, given that Mean has a <:Number type parameter. I personally would expect that
> eltype(Mean(Float32))
Float32
Surprisingly other objects like FitNormal don't allow a type parameter, even though it is parametrized with V<:Variance, so one might expect something like
> tmp = FitNormal(Float32)
FitNormal{Variance{Float32, Float32, EqualWeight}}: n=0 | value=(0.0, 1.0)
> eltype(tmp)
Float32
to work.
I am not sure when I would have time to work on something like this, but I first wanted to open this issue, and see if the above would be a desired behaviour.
I'm not sure what you mean by type info is lost. eltype is used primarily for iteration, which isn't defined (e.g. for i in Mean()... is an error)
To your second point, FitNormal(Variance(Float32)) works, but I suppose the shorter FitNormal(T) would be nice to have.
In my specific use case I am using EnsembleProblem from SciML and reducing the results with OnlineStats, as I want to compute a lot of trajectories in a way that doesn't blow up my RAM.
My current implementation returns a Vector{<:OnlineStat} for the trajectory (which may or may not be the best option, but we will see)
However, when constructing the solution object, a eltype(eltype(T)) happens, which makes the solution parametrized with Any, which is not great.
Long story short, I had
>eltype(eltype(Float64))
Float64
as reference for the behaviour I had been expecting, and was hence surprised.
Hmm, okay.
Where is the eltype(eltype(T)) happening/why is that necessary? I'm trying to understand the use case since OnlineStats aren't iterable to begin with.
I'm not sure what a "trajectory" is in this context, but maybe you want to use value.(trajectory) instead of the stats directly?
A trajectory in the ODE/ dynamical system sense, where one might have m states, each with dimension d.
This could be a scalar ODE, so each state would be a Number, or something higher dimensional, in which case each state is a Vector{<:Number}. The whole trajectory is then a Vector{<: Number} or a Vector{Vector{<:Number}
Now, for something like a SDE, each solution might be slightly different, and one wants summary statistics for a (large) collection of trajectories for the distribution of states at each time step.
The way I went about this is to have a Vector{<:OnlineStat}, i.e. by doing [FitNormal() for _ in 1:m] and add trajectories via broadcasting. Once the simulation is done, I can nicely get the values out by broadcasting mean.(..), cov.(...) or similar.
I suppose I could do this via Group, but it does not seem like there is a great constructor for large groups (but I might have missed something).
Even then, if I do something like
> g = Group(FitNormal(), FitNormal())
> fit!(g, rand(2))
I can't get the means out as easily as both mean.(g) and mean(g) don't work, so I have to go via value.
Further, even though Group is iterable, we again get
> eltype(g)
Any
This is sensible, since a group could contain anything, but in a case like this, where all stats in the group are the same, one might expect a more specific eltype.
Also comparing to Distributions:
> eltype(Distributions.Normal(2.f0))
Float32
> eltype(Distributions.MvNormal([2.f0, 3.f0]))
Float32
Given that FitNormal and Normal otherwise function quite similar, it is again surprising to see a difference here.
I think that eltypes are quite useful beyond iterating to indicate what kind of data is wrapped in an object.
Thanks for the info!
I'll have to mull this over a bit since I'd rather not add methods to the OnlineStatsBase interface if I can avoid it.
I just took a stab at creating a convenience constructor (see #258), but stumbled over additional surprising behaviour.
First, the internal type of FitMvNormal is fixed to CovMatrix{Float64}, and second the fallback does not incorporate type information even when it can be specified (i.e. for FitNormal).
julia> m = FitNormal(Variance(Float32))
FitNormal: n=0 | value=(0.0, 1.0)
julia> typeof(value(m))
Tuple{Float64, Float64}
julia> for _ in 1:3
fit!(m, rand(Float32))
end
julia> m
FitNormal: n=3 | value=(0.482926, 0.478244)
julia> typeof(value(m))
Tuple{Float32, Float32}
I also note that
julia> typeof(m.v)
Variance{Float32, Float32, EqualWeight}
which suggests that it is possible to have a Float32 mean and a Float64 variance?
I have made an attempt to fix the above, let me know what you think.
On that note, I am using Float32/ Float64 as placeholders, that could also be replaced with any new user-defined type NewScalarNumberType <: Real. This might be quite interesting.