Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

`rand(d, 1000)` is not propagated to components

Open mmikhasenko opened this issue 6 months ago • 3 comments

Sometimes it is more efficient to generate a sample with rand(d, 1000) than [rand(d) for _ in 1:1000], if an overhead is significant. That is the case for fft_convolve in NumericalDistributions.jl.

A solution would be to propagate rand(eng, d, n::Int) signature to wrapped distributions in Distributions.jl.

It concerns wrappers,

  • MixtureModel,
  • Truncated

Here is my example for testing in https://github.com/mmikhasenko/NumericalDistributions.jl/issues/10

using NumericalDistributions

dt = let
	d1 = truncated(Normal(0, 0.02), -1.0, 1.0)
	d2 = truncated(Normal(2, 0.02), 1.0, 4.0)
	d = fft_convolve(d1, d2)
	truncated(d, -1, 3)
end

@time rand(dt.untruncated, 1000) # ms
@time rand(dt, 1000) # 5s

mmikhasenko avatar Jun 13 '25 12:06 mmikhasenko

A solution would be to propagate rand(eng, d, n::Int) signature to wrapped distributions in Distributions.jl.

It concerns wrappers, * MixtureModel, * Truncated

For these wrappers, in general you can't "just" propagate rand calls with n::Int to the underlying samplers.

For mixture models, each of the n variates could come from a different mixture component; so (naively) one would have to sample n mixture components, then group the variates by mixture component, and call rand! with the mixture component for each group. Depending on the number of variates, number of mixture components, and distributions of mixture components this might be more performant than the current implementation, but this would require some more detailed investigation and benchmarking it seems.

For truncated distributions, it depends on the probability mass of the truncated interval whether rejection sampling or an inverse (log)cdf sampler is used (or something more specialized as for truncated normal distributions). It seems clearer that implementations for batch sampling should be added in this case, but they shouldn't just be propagated to the untruncated distribution.

devmotion avatar Jun 17 '25 19:06 devmotion

Thanks a lot for the reply, @devmotion

For mixture models, each of the n variates could come from a different mixture component; so (naively) one would have to sample n mixture components,

I'm thinking of different algorithm: use multinomial first to split n::Int into the groups, [n1, n2, ...], and coll individual distributions with ther n.

For truncated distributions, it depends on the probability mass of the truncated interval whether

Good to learn about the cases, interesting if there is any objective behind 0.25.

Clearly, the method expect quantile to work (one can always find interval to fall to the "quantile case"), therefore, it won't be inconsistent to fall back to broadcasted quantile.

EDIT: actually, other way around: broadcasting quantile is what I want to avoid. It should fall back to rand(d0, n/tp+margin)

mmikhasenko avatar Jun 17 '25 20:06 mmikhasenko

  1. I see that handling cases for the tp elegant. I wonder if there are better ways. For now, I implemented using rand(d0, n/tp+margin) sampling.
  2. Any ideas for tests are welcome.

mmikhasenko avatar Jun 17 '25 21:06 mmikhasenko