rstan
rstan copied to clipboard
Too Many Items error when reading large CSV
Summary:
When reading in posterior samples from a CSV with more than INT_MAX total elements (2,147,483,647), the CSV cannot be loaded.
Description:
Due to Rstan's reliance on the base R's scan function to load the CSV. It is limited by the R vector size.
This was also documented here: https://groups.google.com/forum/#!topic/stan-users/XvroDYe_yJc
Even though R supports long vectors, they are not yet implemented in the scan function. I created a patch to the scan function here, which fixes the issue.
https://gist.github.com/aaronjg/f39e5966687ca004dab5a10e7655c648
There may also be a way to fix this without patching base R using read.table rather than scan, since that doesn't require loading everything into a single vector.
Reproducible Steps:
Generate a stan output CSV with more than INT_MAX total values, and try to load into stan using the read_stan_csv function.
Current Output:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : too many items Calls: stan -> sampling -> sampling -> .local -> scan
Expected Output:
RStan should read this correctly
RStan Version:
2.17.2
R Version:
3.5.0
Operating System:
Ubuntu, 64 bit.
I also have encountered this issue. Would be good to shift to read.table in rstan if that is an option. Much better then patching scan! Cheers