StanSample.jl icon indicating copy to clipboard operation
StanSample.jl copied to clipboard

inferencedata errors when model contains matrix parameters

Open sethaxen opened this issue 2 years ago • 5 comments

julia> using StanSample, InferenceObjects

julia> model = """
       parameters {
         matrix[2, 3] x;
       }
       model {
         for (i in 1:2)
           x[i,:] ~ std_normal();
       }
       """;

julia> sm = SampleModel("foo", model);

julia> rc = stan_sample(sm);

julia> inferencedata(sm)
ERROR: ArgumentError: no valid permutation of dimensions
Stacktrace:
  [1] permutedims(B::Array{Float64, 4}, perm::Tuple{Int64, Int64, Int64})
    @ Base ./multidimensional.jl:1596
  [2] extract(chns::Array{Float64, 3}, cnames::Vector{String}; permute_dims::Bool)
    @ StanSample ~/.julia/packages/StanSample/tYGEA/src/utils/namedtuples.jl:40
  [3] extract
    @ StanSample ~/.julia/packages/StanSample/tYGEA/src/utils/namedtuples.jl:7 [inlined]
  [4] convert_a3d(a3d_array::Array{Float64, 3}, cnames::Vector{String}, ::Val{:permuted_namedtuples})
    @ StanSample ~/.julia/packages/StanSample/tYGEA/src/utils/namedtuples.jl:106
  [5] read_csv_files(m::SampleModel, output_format::Symbol; include_internals::Bool, chains::UnitRange{…}, start::Int64, kwargs::@Kwargs{})
    @ StanSample ~/.julia/packages/StanSample/tYGEA/src/stansamples/read_csv_files.jl:116
  [6] read_csv_files
    @ ~/.julia/packages/StanSample/tYGEA/src/stansamples/read_csv_files.jl:23 [inlined]
  [7] #read_samples#10
    @ ~/.julia/packages/StanSample/tYGEA/src/stansamples/read_samples.jl:93 [inlined]
  [8] read_samples
    @ ~/.julia/packages/StanSample/tYGEA/src/stansamples/read_samples.jl:84 [inlined]
  [9] inferencedata(m::SampleModel; include_warmup::Bool, log_likelihood_var::Nothing, posterior_predictive_var::Nothing, predictions_var::Nothing, kwargs::@Kwargs{})
    @ InferenceObjectsExt ~/.julia/packages/StanSample/tYGEA/ext/InferenceObjectsExt.jl:85
 [10] inferencedata(m::SampleModel)
    @ InferenceObjectsExt ~/.julia/packages/StanSample/tYGEA/ext/InferenceObjectsExt.jl:76
 [11] top-level scope
    @ REPL[25]:1
Some type information was truncated. Use `show(err)` to see complete types.

Environment

julia> versioninfo()
Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 11 on 8 virtual cores
Environment:
  JULIA_CMDSTAN_HOME = /home/sethaxen/software/cmdstan/2.33.1/
  JULIA_NUM_THREADS = auto
  JULIA_EDITOR = code

(jl_WirtqY) pkg> st
Status `/tmp/jl_WirtqY/Project.toml`
  [b5cf5a8d] InferenceObjects v0.3.13
  [c1514b29] StanSample v7.4.5

sethaxen avatar Nov 08 '23 23:11 sethaxen

Hi Seth,

Thanks for filing an issue and a MWE. Definitely not working.

I will take a look asap. Hopefully later today, but likely tomorrow.

Best, Rob

goedman avatar Nov 09 '23 13:11 goedman

StanSample.jl v7.5.0 contains a fix for this issue. I've added a limited test for a matrix variable (as in your posted issue) but would like to test this for arrays in general as well.

You've probably seen Brian's suggestion to move Stan related I/O to a separate package. I'm still considering the pros and cons of such an effort, but a huge pro would be to clean up code that has been updated for many, many years.

It would also be a good opportunity to add support for complex variables to JSON input files and handling tuple (and complex?) outputs in generated CSV files. I will probably try these out in the current setup first.

goedman avatar Nov 10 '23 21:11 goedman

StanSample.jl v7.5.0 contains a fix for this issue.

Thanks! Indeed, it works for me!

You've probably seen Brian's suggestion to move Stan related I/O to a separate package.

Thanks for the pointer, I hadn't seen that yet. From the ArviZ perspective, it's a bit tricky to support variables that cannot be trivially flattened into an array of reals. There are effectively 3 useful representations of draws:

  • Something close to the data structure the user created. If the variable was represented as a tuple of arrays, then the draws would be an array of tuples of arrays.
  • Something useful for analysis and long-term storage. Virtually all standard analyses require real marginals or tables. Same with plots. So the most useful representation here is flattening all data structures to real numbers or arrays.
  • Something like MonteCarloMeasurements.jl or posterior's var, where the marginal draws are packed into something representing a real number, which allows again for data structures that mimic what the user created in the PPL.

From the perspective of ArviZ.jl, the 2nd is by far the most useful. But for Julia PPLs, where draws can technically be arbitrary Julia types, it would be useful to support the 1st option as well and support interconversion. This was low priority in the past, but Turing now has Cholesky objects as recommended variables, so we need to decide how to support this. Stan's tuple support also makes this high priority for support. I haven't decided how to do this yet, but something like https://github.com/arviz-devs/InferenceObjects.jl/issues/27 is a possibility.

sethaxen avatar Nov 17 '23 23:11 sethaxen

Thanks Seth,

Your 2nd argument is spot on (maybe a key reason why I always in the end seem to switch back to DataFrames).

My current goal for StanIO.jl is to flesh out the :output_format=:nesteddataframe (which is trivial to convert to a NamedTuple). Complex vars are easy to deal with given the .imag and .real name extensions. Arrays are also fairly easy.

Pure tuples are also ok, tuples with mixed in arrays (and vice versa) is a bit more complex.

Rob

goedman avatar Nov 25 '23 21:11 goedman