StatsBase.jl icon indicating copy to clipboard operation
StatsBase.jl copied to clipboard

Questions about `Weights`

Open ParadaCarleton opened this issue 3 years ago • 10 comments

I have two main questions about the current implementation of weights:

  1. Why are they mutable?
  2. Why are S and T of different type? It feels very strange to have the sum potentially be of a different type than the individual weights themselves, and seems to introduce an extra unnecessary type parameter.

ParadaCarleton avatar Dec 11 '21 04:12 ParadaCarleton

1.) mutability is used to keep the sum updated

mschauer avatar Dec 11 '21 11:12 mschauer

1.) mutability is used to keep the sum updated

Hmm, I think that just begs the question, since the sum should only change if some of the values change.

ParadaCarleton avatar Dec 11 '21 21:12 ParadaCarleton

It feels very strange to have the sum potentially be of a different type than the individual weights themselves, and seems to introduce an extra unnecessary type parameter.

Generally, sum(x) isa eltype(x) is not guaranteed. For instance,

julia> x = [true, false, false, true];

julia> typeof(x)
Vector{Bool} (alias for Array{Bool, 1})

julia> sum(x)
2

julia> typeof(sum(x))
Int64

devmotion avatar Dec 19 '21 22:12 devmotion

Mutability was added because @rofinn had a use case for that, see previous discussion at https://github.com/JuliaStats/StatsBase.jl/issues/397. That feels natural to me, but in practice it's a bit annoying for views (see https://github.com/JuliaStats/StatsBase.jl/pull/723). So recently I've been wondering whether we should disallow it. Do you have any particular reason to do that too?

nalimilan avatar Dec 21 '21 13:12 nalimilan

Mutability was added because @rofinn had a use case for that, see previous discussion at #397. That feels natural to me, but in practice it's a bit annoying for views (see #723). So recently I've been wondering whether we should disallow it. Do you have any particular reason to do that too?

Nope! Just mildly curious given immutability is the default and generally preferred.

ParadaCarleton avatar Dec 22 '21 18:12 ParadaCarleton

Yeah, making the types mutable was really just the path of least resistance for writing memory efficient algorithms. If it's becoming too cumbersome to handle, particularly when working with views, then I don't see why we can't disallow it. For the most part I can probably just switch our algs to use regular vectors instead.

rofinn avatar Dec 22 '21 18:12 rofinn

Yeah, making the types mutable was really just the path of least resistance for writing memory efficient algorithms. If it's becoming too cumbersome to handle, particularly when working with views, then I don't see why we can't disallow it. For the most part I can probably just switch our algs to use regular vectors instead.

Shouldn't the values of the weights still be mutable (since the weights are stored in a mutable vector)? If you normalize the weights to have an unchanged sum, you should be able to mutate the weights in-place.

ParadaCarleton avatar Dec 23 '21 02:12 ParadaCarleton

If you normalize the weights to have an unchanged sum, you should be able to mutate the weights in-place.

Normalization means that you have to change all weights instead of just one and potentially run into type issues. E.g., you could use weights Int[1, 1, 2]; then increasing the first weight to 2 would lead to the weights Int[2, 1, 2] - but if you want to keep the same sum you would have to use eg. [8//5, 4//5, 8//5] or [1.6, 0.8, 1.6], ie you would have to update all weights and can't use integer weights anymore.

devmotion avatar Dec 23 '21 07:12 devmotion

If you normalize the weights to have an unchanged sum, you should be able to mutate the weights in-place.

Normalization means that you have to change all weights instead of just one and potentially run into type issues. E.g., you could use weights Int[1, 1, 2]; then increasing the first weight to 2 would lead to the weights Int[2, 1, 2] - but if you want to keep the same sum you would have to use eg. [8//5, 4//5, 8//5] or [1.6, 0.8, 1.6], ie you would have to update all weights and can't use integer weights anymore.

Theoretically possible, but that sounds like it could be fixed pretty easily by using float weights.

ParadaCarleton avatar Dec 23 '21 17:12 ParadaCarleton

No, it can't, you would still have to update the whole array instead of a single entry. There are also good reasons for using integers instead of floats, eg they don't cause undesired promotions and you don't have to deal with floating point issues.

devmotion avatar Dec 23 '21 17:12 devmotion