DataStructures.jl icon indicating copy to clipboard operation
DataStructures.jl copied to clipboard

rand() for multisets

Open hwjsnc opened this issue 4 years ago • 4 comments

This package does not currently define Random.rand(::Accumulator) so the default implementation for an AbstractDictionary is used, returning a random entry x=>count of the underlying dictionary. For example, rand(DataStructures.Accumulator(Dict(2=>4))) returns 2 => 4.

In my opinion, rand should return one of the values/keys with a probability proportional to its corresponding count.

If the maintainers agree, I can implement this and prepare a pull request.

Counterpoints: This only makes sense if all counts are nonnegative and finite. The implementation would either have to store the total count and update it for every modification, or recompute it every time rand is called. Either choice will be inappropriate for some applications.

hwjsnc avatar Nov 26 '20 16:11 hwjsnc

That seems like the correct behavour for a generalized multiset, yes. And accumulator does currently act as a generalized multiset (i.e a multiset where some items may have fractional count).

There is currently a open PR looking at splitting the multiset behavour into a seperate type. #691

oxinabox avatar Nov 27 '20 16:11 oxinabox

Accumulators hold negative values as well (intentionally, as far as I can tell), so this change does seem better suited for a dedicated multiset type.

hwjsnc avatar Nov 28 '20 09:11 hwjsnc

We have a negative multiplicity error that we throw when you try a multiset operation on a Accumulator with negative values

oxinabox avatar Nov 28 '20 09:11 oxinabox

This is only true for the multiset operations, right? The basic functions like inc! and dec! seem to allow negative values, and I imagine this to be useful sometimes. So I suppose random sampling for accumulators makes no sense in general, and should be reserved for the proper multiset type added with the previously mentioned pull request. I'll change the title accordingly.

hwjsnc avatar Nov 28 '20 13:11 hwjsnc