RandomExtensions.jl icon indicating copy to clipboard operation
RandomExtensions.jl copied to clipboard

sprand as samples from sparse distribution

Open abraunst opened this issue 6 years ago • 3 comments
trafficstars

I was thinking, in the spirit of this package, maybe rand(Normal(),SparseMatrixCSC,p,m,n) could be better expressed as rand(Bernoulli(p,Normal()),SparseMatrixCSC,m,n) where Bernoulli(p, Normal()) would be the "Gauss-Bernoulli" or "Spike-and-Slab" mixture distribution

P(x) = (1-p) delta(x)+ p Normal(x)

It seems to make things a bit more generic.

abraunst avatar Apr 03 '19 21:04 abraunst

Sounds like a very interesting idea. It would require to have the non-zero struture depend on the values (i.e. test each produced value for nullity), so I wonder whether possible multiple allocations would have negative performance impact. But definitely worth exploring. I may be even possible to support both API (for now I guess I prefer to not get rid of the current API, as it feels closer to the sprand API and makes it probably easier to switch).

rfourquet avatar Apr 07 '19 09:04 rfourquet

Sounds like a very interesting idea. It would require to have the non-zero struture depend on the values (i.e. test each produced value for nullity), so I wonder whether possible multiple allocations would have negative performance impact. But definitely worth exploring. I may be even possible to support both API (for now I guess I prefer to not get rid of the current API, as it feels closer to the sprand API and makes it probably easier to switch).

For sure, if the p value is small enough, the sampler should do what the current sprand does, i.e. extract the non-zero indices and then fill them. I confess that I tried to implement it in RandomExtensions but I am still a bit lost in the design :sweat_smile: .

In any case, even if RandomExtensions makes its way into stdlib (which I would love to see), I think that the current interface in stdlib should be left as convenience functions (without maybe the rfn param and other bells and whistles).

abraunst avatar Apr 07 '19 09:04 abraunst

I tried to implement it in RandomExtensions

Cool!!

I tried to implement it in RandomExtensions but I am still a bit lost in the design

Sorry for that, the internals have quite evolved last time I worked on it, and didn't document yet. Feel free to open an issue to ask for help, and I will answer there or write documentation (but I will have very little time in the upcoming week).

I think that the current interface in stdlib should be left

I don't have a lot of hopes for sprand to go away. I agree that sprand is more convenient vs rand([T], SparseVector, p, n, m), that's why I initially (in the Base PR) added the short version rand([T], p, n, m) to give it a chance to compete favorably against sprand. But IIRC, someone had noted somewhere that it's not this short version is not very clear, so this is not an unanimous solution!

rfourquet avatar Apr 07 '19 11:04 rfourquet