rstan icon indicating copy to clipboard operation
rstan copied to clipboard

Too Many Items error when reading large CSV

Open aaronjg opened this issue 6 years ago • 1 comments

Summary:

When reading in posterior samples from a CSV with more than INT_MAX total elements (2,147,483,647), the CSV cannot be loaded.

Description:

Due to Rstan's reliance on the base R's scan function to load the CSV. It is limited by the R vector size.

This was also documented here: https://groups.google.com/forum/#!topic/stan-users/XvroDYe_yJc

Even though R supports long vectors, they are not yet implemented in the scan function. I created a patch to the scan function here, which fixes the issue.

https://gist.github.com/aaronjg/f39e5966687ca004dab5a10e7655c648

There may also be a way to fix this without patching base R using read.table rather than scan, since that doesn't require loading everything into a single vector.

Reproducible Steps:

Generate a stan output CSV with more than INT_MAX total values, and try to load into stan using the read_stan_csv function.

Current Output:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : too many items Calls: stan -> sampling -> sampling -> .local -> scan

Expected Output:

RStan should read this correctly

RStan Version:

2.17.2

R Version:

3.5.0

Operating System:

Ubuntu, 64 bit.

aaronjg avatar May 23 '18 03:05 aaronjg

I also have encountered this issue. Would be good to shift to read.table in rstan if that is an option. Much better then patching scan! Cheers

JHarrisonEcoEvo avatar Apr 08 '21 17:04 JHarrisonEcoEvo