Turing.jl
Turing.jl copied to clipboard
PG doesn't work multivariate distributions
This was discovered by @owensnick in https://discourse.julialang.org/t/avoiding-loops-in-turing/59830.
n = 200
configs = [-820:180:-100, 280:180:1000]
class = rand(1:2, n);
pos = rand(1:5, n);
x = [rand(Normal(configs[c][p], 5)) for (c, p) in zip(class, pos)];
The following model works as expected, producing something looks reasonable:
@model function modelloop(x, configs, σ, ::Type{TV} = Vector{Int}) where {TV}
n = length(x)
w = [0.5, 0.5]
p = ones(5)/5
class = TV(undef, n)
pos = TV(undef, n)
for i = 1:n
class[i] ~ Categorical(w)
pos[i] ~ Categorical(p)
x[i] ~ Normal(configs[class[i]][pos[i]], σ)
end
end
ml = modelloop(x, configs, 5)
chain_loop = sample(ml, PG(5), 100);
In contrast, the following "vectorized" version of the model results in samples from what looks/likely is the prior:
@model function modelvec(x, configs, σ)
n = length(x)
w = [0.5, 0.5]
p = ones(5)/5
class ~ filldist(Categorical(w), n)
pos ~ filldist(Categorical(p), n)
x ~ arraydist(map((c, l) -> Normal(configs[c][l], σ), class, pos))
end
mv = modelvec(x, configs, 5);
chain_vec = sample(mv, PG(5), 100);
And, to me, even stranger, if you write out the observation in a for loop, i.e.
for i = 1:n
x[i] ~ Normal(configs[class[i]][pos[i]], σ)
end
in the vectorized model, you end up with something that looks reasonable only for i = 1.
I don't have time to look more deeply into it right now, but figured I'd put this here to hold us accountable.
@devmotion seemed to have some intuition of at least what's going wrong (from slack):
PG and SMC are based on propagation at every observe statement
So I guess the problem might be that we can't tell if the vectorized statement is a single observation or a set of observations
It's a known limitation of the current Turing DSL. @devmotion's intuition is correct - particle samplers rely on the reweighting and resampling step to reduce dependency, the vectorised model implementation makes it impossible to perform such resampling operations.
Interestingly, we can support vectorisation (for particle samplers) if we have a DAG structure similar to Soss. So it might become possible to support this when AbstractPPL incorporated some aspects of Soss (cc @phipsgabler @cscherrer). Overall, the imperative approach of probabilistic programming has its limits and can be enhanced by symbolic approaches.
But is this a problem for vectorized observe
, not for vectorized assume
, right? I.e. don't we expect
@model function modelvec(x, configs, σ)
n = length(x)
w = [0.5, 0.5]
p = ones(5)/5
class ~ filldist(Categorical(w), n)
pos ~ filldist(Categorical(p), n)
for i = 1:n
x[i] ~ Normal(configs[class[i]][pos[i]], σ)
end
end
mv = modelvec(x, configs, 5);
chain_vec = sample(mv, PG(5), 100);
to work? This also fails.
Also, we really need to put this limitation clearly in the docs somewhere; even I couldn't find anything about this limitation after searching for a bit :confused:
class
and pos
are no TArrays in the last example, which is problematic I assume?
The model formulation also looks a bit strange (there are no variables sampled/no hidden states when you move from one time step/observation to the next in the model execution) but I don't think this is a problem: you still reweight based on class[i]
and pos[i]
.
class and pos are no TArrays in the last example, which is problematic I assume?
Could this be fixed though? I.e. just convert it to a TArray
?
The model formulation also looks a bit strange
I agree with this, but you could imagine cases where the model is more appropriate but you're still sampling some variables in a multivariate fashion like this, no?
Hi all. Thanks for your efforts with Turing - really enjoying using it. The model is mine, just to be clear, in the full model config
is not fixed and is sampled by realisations of a stochastic process and each observation is two pos
samples from the same class
. I was getting weird results when I tried the vectorised versions and ended up removing things from the model until I got to the above. Although its not a complete MWE as I guess I could have shown it with pos
alone.
How does one convert vectorised samples to TArray
s, is the following enough:
class = tzeros(Int, n)
class ~ filldist(Categorical(w), n)
Or, would the second line automatically generate a TArray? I realise the issue with particle samplers and vectorised model implementations still holds.
@owensnick thanks for raising the issue. For this model, I think the best implementation is to avoid using filldist
for defining latent variables. The loop-based implementation modelloop
should be more efficient for the particle samplers, since it delays the instantiation of latent variables until they are needed. Such tricks are known as "delayed-sampling" and are generally helpful for particle MCMC methods. See below for a reference on "delayed-sampling".
- Murray, L., Lundén, D., Kudlicka, J., Broman, D., & Schön, T. (2018). Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs. In A. Storkey & F. Perez-Cruz (Eds.), Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics (Vol. 84, pp. 1037–1046). PMLR.
@yebai that makes sense - thanks a lot.