Turing.jl PG doesn't work multivariate distributions

This was discovered by @owensnick in https://discourse.julialang.org/t/avoiding-loops-in-turing/59830.

n = 200
configs = [-820:180:-100, 280:180:1000]
class   = rand(1:2, n);
pos     = rand(1:5, n);
x = [rand(Normal(configs[c][p], 5)) for (c, p) in zip(class, pos)];

The following model works as expected, producing something looks reasonable:

@model function modelloop(x, configs, σ, ::Type{TV} = Vector{Int}) where {TV}
    n = length(x)
    w = [0.5, 0.5] 
    p = ones(5)/5

    class  = TV(undef, n)
    pos    = TV(undef, n)
    
    for i = 1:n
        class[i] ~ Categorical(w)
        pos[i]   ~ Categorical(p)
        x[i]     ~ Normal(configs[class[i]][pos[i]], σ)
    end
end

ml = modelloop(x, configs, 5)
chain_loop = sample(ml, PG(5), 100);

In contrast, the following "vectorized" version of the model results in samples from what looks/likely is the prior:

@model function modelvec(x, configs, σ) 
    n = length(x)
    w = [0.5, 0.5] 
    p = ones(5)/5

    class ~ filldist(Categorical(w), n)
    pos   ~ filldist(Categorical(p), n)
    x     ~ arraydist(map((c, l) -> Normal(configs[c][l], σ), class, pos))
end

mv = modelvec(x, configs, 5);
chain_vec  = sample(mv, PG(5), 100);

And, to me, even stranger, if you write out the observation in a for loop, i.e.

for i = 1:n
    x[i] ~ Normal(configs[class[i]][pos[i]], σ)
end

in the vectorized model, you end up with something that looks reasonable only for i = 1.

I don't have time to look more deeply into it right now, but figured I'd put this here to hold us accountable.

@devmotion seemed to have some intuition of at least what's going wrong (from slack):

PG and SMC are based on propagation at every observe statement

So I guess the problem might be that we can't tell if the vectorized statement is a single observation or a set of observations

Apr 23 '21 18:04 torfjelde

It's a known limitation of the current Turing DSL. @devmotion's intuition is correct - particle samplers rely on the reweighting and resampling step to reduce dependency, the vectorised model implementation makes it impossible to perform such resampling operations.

Interestingly, we can support vectorisation (for particle samplers) if we have a DAG structure similar to Soss. So it might become possible to support this when AbstractPPL incorporated some aspects of Soss (cc @phipsgabler @cscherrer). Overall, the imperative approach of probabilistic programming has its limits and can be enhanced by symbolic approaches.

Apr 23 '21 19:04 yebai

But is this a problem for vectorized observe , not for vectorized assume, right? I.e. don't we expect

@model function modelvec(x, configs, σ) 
    n = length(x)
    w = [0.5, 0.5] 
    p = ones(5)/5

    class ~ filldist(Categorical(w), n)
    pos   ~ filldist(Categorical(p), n)
    for i = 1:n
        x[i] ~ Normal(configs[class[i]][pos[i]], σ)
    end
end

mv = modelvec(x, configs, 5);
chain_vec  = sample(mv, PG(5), 100);

to work? This also fails.

Also, we really need to put this limitation clearly in the docs somewhere; even I couldn't find anything about this limitation after searching for a bit :confused:

Apr 23 '21 23:04 torfjelde

class and pos are no TArrays in the last example, which is problematic I assume?

The model formulation also looks a bit strange (there are no variables sampled/no hidden states when you move from one time step/observation to the next in the model execution) but I don't think this is a problem: you still reweight based on class[i] and pos[i].

Apr 24 '21 00:04 devmotion

class and pos are no TArrays in the last example, which is problematic I assume?

Could this be fixed though? I.e. just convert it to a TArray?

The model formulation also looks a bit strange

I agree with this, but you could imagine cases where the model is more appropriate but you're still sampling some variables in a multivariate fashion like this, no?

Apr 24 '21 12:04 torfjelde

Hi all. Thanks for your efforts with Turing - really enjoying using it. The model is mine, just to be clear, in the full model config is not fixed and is sampled by realisations of a stochastic process and each observation is two pos samples from the same class. I was getting weird results when I tried the vectorised versions and ended up removing things from the model until I got to the above. Although its not a complete MWE as I guess I could have shown it with pos alone.

How does one convert vectorised samples to TArrays, is the following enough:

class = tzeros(Int, n)
class ~ filldist(Categorical(w), n)

Or, would the second line automatically generate a TArray? I realise the issue with particle samplers and vectorised model implementations still holds.

Apr 26 '21 10:04 owensnick

@owensnick thanks for raising the issue. For this model, I think the best implementation is to avoid using filldist for defining latent variables. The loop-based implementation modelloop should be more efficient for the particle samplers, since it delays the instantiation of latent variables until they are needed. Such tricks are known as "delayed-sampling" and are generally helpful for particle MCMC methods. See below for a reference on "delayed-sampling".

Murray, L., Lundén, D., Kudlicka, J., Broman, D., & Schön, T. (2018). Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs. In A. Storkey & F. Perez-Cruz (Eds.), Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (Vol. 84, pp. 1037–1046). PMLR.

Apr 26 '21 11:04 yebai

@yebai that makes sense - thanks a lot.

Apr 27 '21 11:04 owensnick

Turing.jl Turing.jl copied to clipboard

PG doesn't work multivariate distributions

Turing.jl
Turing.jl copied to clipboard