Gen.jl icon indicating copy to clipboard operation
Gen.jl copied to clipboard

Add Dirac distribution

Open sharlaon opened this issue 2 years ago • 4 comments

Previous built-in distributions appear to only use specific concrete types, not parametric ones, so necessarily there is some pattern-breaking here. I especially invite feedback about how to best handle this.

sharlaon avatar Oct 11 '23 01:10 sharlaon

Thanks @sharlaon! I think we have to be careful about adding this to Gen --- it's fine if we use this for discrete random choices, but I think it breaks a bunch of the mathematical assumptions that other Gen code has (e.g. the trace translator code) once this is applied to a Dirac distribution over the real numbers. @alex-lew may a write up about this somewhere.

To avoid the issue, we might want to rename this delta_discrete (so that it represents a Kronecker delta function, essentially), and make sure it only accepts Integer (sub)types as parameters and arguments. Users could then extend this to support delta distributions over e.g. Strings or Symbols using the @dist DSL.

ztangent avatar Oct 11 '23 13:10 ztangent

@ztangent How similar is your counter-proposal to categorical([1.0])? Could it be defined with @dist instead of being a new distribution?

lukego avatar Oct 11 '23 13:10 lukego

You can definitely do categorical([1.0]) or uniform_discrete(1, 1), which is often what I do! You can even do a "labeled uniform distribution" as follows:

@dist labeled_uniform(items) = items[uniform_discrete(1, length(items)]

However, due to the limitations of the @dist DSL, you can't actually do the following:

@dist delta_dist(item) = [item][uniform_discrete(1, 1]

So you end up having to using labeled_uniform([item]) instead. That's mostly what I've been doing -- slightly less ergonomic, but it works. It'd be nice to just have the delta_dist as a built in thing to avoid allocating when constructing a list though!

ztangent avatar Oct 11 '23 17:10 ztangent

Thank you for the explanations.

Those idioms are neater than what I've been using

@dist function enum(xs)
    n = length(xs)
    xs[categorical(ones(n) / n)]
end

lukego avatar Oct 12 '23 07:10 lukego

You can definitely do categorical([1.0]) or uniform_discrete(1, 1), which is often what I do! You can even do a "labeled uniform distribution" as follows:

@dist labeled_uniform(items) = items[uniform_discrete(1, length(items)]

However, due to the limitations of the @dist DSL, you can't actually do the following:

@dist delta_dist(item) = [item][uniform_discrete(1, 1]

So you end up having to using labeled_uniform([item]) instead.

Sorry, just seeing this now! For what it's worth, I often do this by writing

singleton(x) = [x]
@dist discrete_dirac(x) = singleton(x)[uniform_discrete(1,1)]

which, perhaps surprisingly, avoids the issue Xuan mentioned. (We should probably just fix the dist DSL to support var[expr].) I'm not against adding discrete_delta as a primitive, with a name (either that name or a similar one) that points to the dangers of using it for continuous variables, as Xuan suggested.

alex-lew avatar Feb 29 '24 21:02 alex-lew