RandomExtensions.jl
RandomExtensions.jl copied to clipboard
sprand as samples from sparse distribution
I was thinking, in the spirit of this package, maybe rand(Normal(),SparseMatrixCSC,p,m,n) could be better expressed as rand(Bernoulli(p,Normal()),SparseMatrixCSC,m,n) where Bernoulli(p, Normal()) would be the "Gauss-Bernoulli" or "Spike-and-Slab" mixture distribution
P(x) = (1-p) delta(x)+ p Normal(x)
It seems to make things a bit more generic.
Sounds like a very interesting idea. It would require to have the non-zero struture depend on the values (i.e. test each produced value for nullity), so I wonder whether possible multiple allocations would have negative performance impact. But definitely worth exploring. I may be even possible to support both API (for now I guess I prefer to not get rid of the current API, as it feels closer to the sprand API and makes it probably easier to switch).
Sounds like a very interesting idea. It would require to have the non-zero struture depend on the values (i.e. test each produced value for nullity), so I wonder whether possible multiple allocations would have negative performance impact. But definitely worth exploring. I may be even possible to support both API (for now I guess I prefer to not get rid of the current API, as it feels closer to the
sprandAPI and makes it probably easier to switch).
For sure, if the p value is small enough, the sampler should do what the current sprand does, i.e. extract the non-zero indices and then fill them. I confess that I tried to implement it in RandomExtensions but I am still a bit lost in the design :sweat_smile: .
In any case, even if RandomExtensions makes its way into stdlib (which I would love to see), I think that the current interface in stdlib should be left as convenience functions (without maybe the rfn param and other bells and whistles).
I tried to implement it in RandomExtensions
Cool!!
I tried to implement it in RandomExtensions but I am still a bit lost in the design
Sorry for that, the internals have quite evolved last time I worked on it, and didn't document yet. Feel free to open an issue to ask for help, and I will answer there or write documentation (but I will have very little time in the upcoming week).
I think that the current interface in stdlib should be left
I don't have a lot of hopes for sprand to go away. I agree that sprand is more convenient vs rand([T], SparseVector, p, n, m), that's why I initially (in the Base PR) added the short version rand([T], p, n, m) to give it a chance to compete favorably against sprand. But IIRC, someone had noted somewhere that it's not this short version is not very clear, so this is not an unanimous solution!