rstan icon indicating copy to clipboard operation
rstan copied to clipboard

read_stan_csv gives non reproducible results

Open wds15 opened this issue 8 years ago • 7 comments

Summary:

Importing runs from cmdstan which are exactly the same runs (two calls to cmdstan with the same seed and data, etc.) then one ends up getting numerical differences. Whereas running the sampler in rstan directly twice will give exactly the same. Also, when importing the raw csv file with read.csv2 one does get exactly the same data read in.

Description:

Attached is an example.

Reproducible Steps:

See the zip file

rstan_read_stan_csv_bug.zip

Current Output:

Included in zip file. The 2_14_0/normal_exact_repo_bug.Rout file contains the log demonstrating the problem. The R/normal_exact_repo_bug.Rout log shows that things are OK when everything is run inside RStan.

Expected Output:

Stuff should be the same when imported with read_stan_csv.

RStan Version:

2.14.1

R Version:

3.2.1 (but it also happened on 3.3.2)

Operating System:

RHEL 6.7, but also happened on macOS sierra

wds15 avatar Feb 16 '17 09:02 wds15

Ok, I finally solved the mystery here and the good news is that everything is exactly reproducibile, but the behavior is inconsistent overall.

So what happened is that the extract function shuffles the mcmc results. The shuffling is apparently not done when no warmup is performed.

Moreover, it appears to me that when using rstan to do the sampling and then extraction of the results, the shuffling is done consistently with the random seed given (which is good). Now when using read_stan_csv the shuffling is not done consistently. As a result you get differently shuffled posteriors when using the extract function - results only match exactly if you use the as.matrix extractors.

Long story short: Would it be possible to read out the random seed from the cmdstan output and put that into the rstan object representing the fit?

wds15 avatar Feb 16 '17 13:02 wds15

We should at least add to the documentation of read_stan_csv that the random number gen state matters when calling the read_stan_csv function. The much better solution would be to extract that seed from the csv file and use that one.

wds15 avatar Feb 16 '17 13:02 wds15

The shuffling of the mcmc results by default is evil, and as discussed elsewhere there are many who think it should be removed, which would solve also this problem.

avehtari avatar Feb 16 '17 20:02 avehtari

@bgoodri Would you accept changing the default to permuted=FALSE? I just realized I was extracting samples with two different extract calls and loosing the parameter covariance that way. It's a source of all sorts of bugs.

sakrejda avatar Feb 16 '17 20:02 sakrejda

The plan is to change a lot of things in RStan (the biggest is moving to reference classes) this spring. The permutation of the draws will be (happily) discarded at that time!

On Thu, Feb 16, 2017 at 3:49 PM Krzysztof Sakrejda [email protected] wrote:

@bgoodri https://github.com/bgoodri Would you accept changing the default to permuted=FALSE? I just realized I was extracting samples with two different extract calls and loosing the parameter covariance that way. It's a source of all sorts of bugs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/393#issuecomment-280455724, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q9gHFmHPgwaR7nlwWW5wXGEIv6FMks5rdLZjgaJpZM4MCyW8 .

jgabry avatar Feb 16 '17 21:02 jgabry

Why were the draws shuffled/permuted in the first place? Seems like a strange default behavior.

bdeonovic avatar Jul 29 '22 15:07 bdeonovic

Yes

On Fri, Jul 29, 2022 at 11:45 AM Benjamin Deonovic @.***> wrote:

Why were the draws shuffled/permuted in the first place? Seems like a strange default behavior.

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstan/issues/393#issuecomment-1199577784, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKWIEHPITHEZMN6SSD3VWP4A7ANCNFSM4DALEW6A . You are receiving this because you were mentioned.Message ID: @.***>

bgoodri avatar Jul 29 '22 15:07 bgoodri