StatsBase.jl
StatsBase.jl copied to clipboard
Questions about `Weights`
I have two main questions about the current implementation of weights:
- Why are they mutable?
- Why are S and T of different type? It feels very strange to have the sum potentially be of a different type than the individual weights themselves, and seems to introduce an extra unnecessary type parameter.
1.) mutability is used to keep the sum updated
1.) mutability is used to keep the sum updated
Hmm, I think that just begs the question, since the sum should only change if some of the values change.
It feels very strange to have the sum potentially be of a different type than the individual weights themselves, and seems to introduce an extra unnecessary type parameter.
Generally, sum(x) isa eltype(x)
is not guaranteed. For instance,
julia> x = [true, false, false, true];
julia> typeof(x)
Vector{Bool} (alias for Array{Bool, 1})
julia> sum(x)
2
julia> typeof(sum(x))
Int64
Mutability was added because @rofinn had a use case for that, see previous discussion at https://github.com/JuliaStats/StatsBase.jl/issues/397. That feels natural to me, but in practice it's a bit annoying for views (see https://github.com/JuliaStats/StatsBase.jl/pull/723). So recently I've been wondering whether we should disallow it. Do you have any particular reason to do that too?
Mutability was added because @rofinn had a use case for that, see previous discussion at #397. That feels natural to me, but in practice it's a bit annoying for views (see #723). So recently I've been wondering whether we should disallow it. Do you have any particular reason to do that too?
Nope! Just mildly curious given immutability is the default and generally preferred.
Yeah, making the types mutable was really just the path of least resistance for writing memory efficient algorithms. If it's becoming too cumbersome to handle, particularly when working with views, then I don't see why we can't disallow it. For the most part I can probably just switch our algs to use regular vectors instead.
Yeah, making the types mutable was really just the path of least resistance for writing memory efficient algorithms. If it's becoming too cumbersome to handle, particularly when working with views, then I don't see why we can't disallow it. For the most part I can probably just switch our algs to use regular vectors instead.
Shouldn't the values of the weights still be mutable (since the weights are stored in a mutable vector)? If you normalize the weights to have an unchanged sum, you should be able to mutate the weights in-place.
If you normalize the weights to have an unchanged sum, you should be able to mutate the weights in-place.
Normalization means that you have to change all weights instead of just one and potentially run into type issues. E.g., you could use weights Int[1, 1, 2]
; then increasing the first weight to 2
would lead to the weights Int[2, 1, 2]
- but if you want to keep the same sum you would have to use eg. [8//5, 4//5, 8//5]
or [1.6, 0.8, 1.6]
, ie you would have to update all weights and can't use integer weights anymore.
If you normalize the weights to have an unchanged sum, you should be able to mutate the weights in-place.
Normalization means that you have to change all weights instead of just one and potentially run into type issues. E.g., you could use weights
Int[1, 1, 2]
; then increasing the first weight to2
would lead to the weightsInt[2, 1, 2]
- but if you want to keep the same sum you would have to use eg.[8//5, 4//5, 8//5]
or[1.6, 0.8, 1.6]
, ie you would have to update all weights and can't use integer weights anymore.
Theoretically possible, but that sounds like it could be fixed pretty easily by using float weights.
No, it can't, you would still have to update the whole array instead of a single entry. There are also good reasons for using integers instead of floats, eg they don't cause undesired promotions and you don't have to deal with floating point issues.