MCMCChains.jl icon indicating copy to clipboard operation
MCMCChains.jl copied to clipboard

Order of Array Parameters is not Preserved by `get`

Open farr opened this issue 3 years ago • 4 comments

If I have a chain with array parameters, get returns them in some random order, not numeric:

c = Chains(randn(100, 13, 4), ["x[$(i)]" for i in 13:-1:1])
xs = get(c, :x).x

all(xs[1] .== c[Symbol("x[1]")]) # false
all(xs[1] .== c[Symbol("x[13]")]) # true

This is a really bad bug for people who work with arrays....

farr avatar Jun 16 '21 18:06 farr

in some random order

They are returned in the order in which they appear in the chain. Since you constructed the chain in such a way that the first parameter belonging to x is x[13], x[13] is the first column returned by get.

devmotion avatar Jun 16 '21 18:06 devmotion

You can use sort to reorder the parameters in natural sort order (or any other order but this is the default):

sorted_chain = sort(c)
sorted_xs = get(sorted_chain, :x).x

sorted_xs[1] == c["x[1]"] # true
sorted_xs[2] == c["x[2]"] # true
sorted_xs[11] == c["x[11]"] # true

devmotion avatar Jun 17 '21 07:06 devmotion

This may affect the reliability of the summarystats() output.

In a multi-level model like this.

y = [28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0]
sigma = [15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0]

@model function eight(;y=missing, σ=sigma, N=8)
    y = y===missing ? Array{Float64}(undef, N) : y
    
    μ ~ Normal(0, 5)
    τ ~ truncated(Cauchy(0, 5), 0, Inf)
    θ ~ filldist(Normal(μ, τ), N)
    
    @. y ~ Normal(θ, σ)
end

The obtained chain has θ parameters in haphazard order.

group(chain, :θ)

OUTPUT:

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 0.5 seconds
Compute duration  = 0.5 seconds
parameters        = θ[5], θ[7], θ[8], θ[2], θ[1], θ[4], θ[6], θ[3]
internals         = 

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   e ⋯
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64     ⋯

        θ[1]    7.0462    6.4852     0.2051    0.4285   268.2643    1.0079     ⋯
        θ[2]    5.3364    5.1565     0.1631    0.2275   351.0179    0.9996     ⋯
        θ[3]    3.7642    5.6877     0.1799    0.3353   375.4434    0.9991     ⋯
        θ[4]    4.8493    5.3475     0.1691    0.3289   346.8819    1.0046     ⋯
        θ[5]    3.1562    5.1128     0.1617    0.2497   351.7596    1.0020     ⋯
        θ[6]    3.8590    5.2401     0.1657    0.3042   358.8378    1.0002     ⋯
        θ[7]    7.2531    5.9002     0.1866    0.3476   262.3535    1.0127     ⋯
        θ[8]    5.1030    6.0514     0.1914    0.3020   414.4987    1.0040     ⋯

The ordering of θ parameters is indicated as θ[5], θ[7], θ[8], θ[2], θ[1], θ[4], θ[6], θ[3] so that the result of the summarystat(chain) reflect this ordering. But the actual printout of summarystat(chain) confuse the user because the parameters column lists the θ parameters as θ[1], θ[2].... In the above example, the printout mean value of θ[1] 7.0462 is actually the man value of θ[5] (the first one in the order).

As a newbie like me, this weird behavior made me quite frustrating.

Thanks.

AnselmJeong avatar Jul 31 '21 23:07 AnselmJeong

I was also bitten by this.

By the way, a similar confusion can arise with simple chains[[...]] indexing:

julia> chains = Chains([10i+j for i in 1:2, j in 1:5], [:a, :b, :c, :d, :e]);

julia> Array(chains[[:d, :a, :e]])
2×3 Matrix{Int64}:
 11  14  15
 21  24  25

Here the matrix columns are not in the "requested" order. (Is there an easy way to get them in the desired order?)

I guess a switch to AxisKeys (as suggested in #231) would help here too...

knuesel avatar Aug 04 '21 09:08 knuesel