MCMCChains.jl
MCMCChains.jl copied to clipboard
Order of Array Parameters is not Preserved by `get`
If I have a chain with array parameters, get returns them in some random order, not numeric:
c = Chains(randn(100, 13, 4), ["x[$(i)]" for i in 13:-1:1])
xs = get(c, :x).x
all(xs[1] .== c[Symbol("x[1]")]) # false
all(xs[1] .== c[Symbol("x[13]")]) # true
This is a really bad bug for people who work with arrays....
in some random order
They are returned in the order in which they appear in the chain. Since you constructed the chain in such a way that the first parameter belonging to x is x[13], x[13] is the first column returned by get.
You can use sort to reorder the parameters in natural sort order (or any other order but this is the default):
sorted_chain = sort(c)
sorted_xs = get(sorted_chain, :x).x
sorted_xs[1] == c["x[1]"] # true
sorted_xs[2] == c["x[2]"] # true
sorted_xs[11] == c["x[11]"] # true
This may affect the reliability of the summarystats() output.
In a multi-level model like this.
y = [28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]
sigma = [15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]
@model function eight(;y=missing, σ=sigma, N=8)
y = y===missing ? Array{Float64}(undef, N) : y
μ ~ Normal(0, 5)
τ ~ truncated(Cauchy(0, 5), 0, Inf)
θ ~ filldist(Normal(μ, τ), N)
@. y ~ Normal(θ, σ)
end
The obtained chain has θ parameters in haphazard order.
group(chain, :θ)
OUTPUT:
Iterations = 501:1:1500
Number of chains = 1
Samples per chain = 1000
Wall duration = 0.5 seconds
Compute duration = 0.5 seconds
parameters = θ[5], θ[7], θ[8], θ[2], θ[1], θ[4], θ[6], θ[3]
internals =
Summary Statistics
parameters mean std naive_se mcse ess rhat e ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯
θ[1] 7.0462 6.4852 0.2051 0.4285 268.2643 1.0079 ⋯
θ[2] 5.3364 5.1565 0.1631 0.2275 351.0179 0.9996 ⋯
θ[3] 3.7642 5.6877 0.1799 0.3353 375.4434 0.9991 ⋯
θ[4] 4.8493 5.3475 0.1691 0.3289 346.8819 1.0046 ⋯
θ[5] 3.1562 5.1128 0.1617 0.2497 351.7596 1.0020 ⋯
θ[6] 3.8590 5.2401 0.1657 0.3042 358.8378 1.0002 ⋯
θ[7] 7.2531 5.9002 0.1866 0.3476 262.3535 1.0127 ⋯
θ[8] 5.1030 6.0514 0.1914 0.3020 414.4987 1.0040 ⋯
The ordering of θ parameters is indicated as θ[5], θ[7], θ[8], θ[2], θ[1], θ[4], θ[6], θ[3] so that the result of the summarystat(chain) reflect this ordering. But the actual printout of summarystat(chain) confuse the user because the parameters column lists the θ parameters as θ[1], θ[2]....
In the above example, the printout mean value of θ[1] 7.0462 is actually the man value of θ[5] (the first one in the order).
As a newbie like me, this weird behavior made me quite frustrating.
Thanks.
I was also bitten by this.
By the way, a similar confusion can arise with simple chains[[...]] indexing:
julia> chains = Chains([10i+j for i in 1:2, j in 1:5], [:a, :b, :c, :d, :e]);
julia> Array(chains[[:d, :a, :e]])
2×3 Matrix{Int64}:
11 14 15
21 24 25
Here the matrix columns are not in the "requested" order. (Is there an easy way to get them in the desired order?)
I guess a switch to AxisKeys (as suggested in #231) would help here too...