rstan
rstan copied to clipboard
read_stan_csv gives non reproducible results
Summary:
Importing runs from cmdstan which are exactly the same runs (two calls to cmdstan with the same seed and data, etc.) then one ends up getting numerical differences. Whereas running the sampler in rstan directly twice will give exactly the same. Also, when importing the raw csv file with read.csv2 one does get exactly the same data read in.
Description:
Attached is an example.
Reproducible Steps:
See the zip file
Current Output:
Included in zip file. The 2_14_0/normal_exact_repo_bug.Rout file contains the log demonstrating the problem. The R/normal_exact_repo_bug.Rout log shows that things are OK when everything is run inside RStan.
Expected Output:
Stuff should be the same when imported with read_stan_csv.
RStan Version:
2.14.1
R Version:
3.2.1 (but it also happened on 3.3.2)
Operating System:
RHEL 6.7, but also happened on macOS sierra
Ok, I finally solved the mystery here and the good news is that everything is exactly reproducibile, but the behavior is inconsistent overall.
So what happened is that the extract function shuffles the mcmc results. The shuffling is apparently not done when no warmup is performed.
Moreover, it appears to me that when using rstan to do the sampling and then extraction of the results, the shuffling is done consistently with the random seed given (which is good). Now when using read_stan_csv the shuffling is not done consistently. As a result you get differently shuffled posteriors when using the extract function - results only match exactly if you use the as.matrix extractors.
Long story short: Would it be possible to read out the random seed from the cmdstan output and put that into the rstan object representing the fit?
We should at least add to the documentation of read_stan_csv
that the random number gen state matters when calling the read_stan_csv function. The much better solution would be to extract that seed from the csv file and use that one.
The shuffling of the mcmc results by default is evil, and as discussed elsewhere there are many who think it should be removed, which would solve also this problem.
@bgoodri Would you accept changing the default to permuted=FALSE
? I just realized I was extracting samples with two different extract
calls and loosing the parameter covariance that way. It's a source of all sorts of bugs.
The plan is to change a lot of things in RStan (the biggest is moving to reference classes) this spring. The permutation of the draws will be (happily) discarded at that time!
On Thu, Feb 16, 2017 at 3:49 PM Krzysztof Sakrejda [email protected] wrote:
@bgoodri https://github.com/bgoodri Would you accept changing the default to permuted=FALSE? I just realized I was extracting samples with two different extract calls and loosing the parameter covariance that way. It's a source of all sorts of bugs.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/393#issuecomment-280455724, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q9gHFmHPgwaR7nlwWW5wXGEIv6FMks5rdLZjgaJpZM4MCyW8 .
Why were the draws shuffled/permuted in the first place? Seems like a strange default behavior.
Yes
On Fri, Jul 29, 2022 at 11:45 AM Benjamin Deonovic @.***> wrote:
Why were the draws shuffled/permuted in the first place? Seems like a strange default behavior.
— Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/393#issuecomment-1199577784, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKWIEHPITHEZMN6SSD3VWP4A7ANCNFSM4DALEW6A . You are receiving this because you were mentioned.Message ID: @.***>