BGGM icon indicating copy to clipboard operation
BGGM copied to clipboard

Error when estimating Gaussian Copula model on large dataset

Open jmbh opened this issue 2 years ago • 4 comments

Hi Donny,

I'm getting the following error when estimating a Gaussian Copula model with a large dataset:

BGGM: Posterior Sampling 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----|

error: Cube::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD **************************************************| Error in estimate(data + 1, iter = 5000, type = "mixed", seed = floor(runif(1, : Cube::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD

I'm using BGGM 2.0.4 and this code:

out_bggm_cop <- estimate(data + 1, iter = 5000, type = "mixed", # gcop seed = floor(runif(1, 0, 1000)), progress = TRUE)

You can find the data here: https://jmbh.github.io/data/data_donny.RDS

Thanks, Jonas

jmbh avatar Nov 16 '22 14:11 jmbh

Hey

That looks like an error with armadillo.

Probably have to try enabling the ARMA_

I have never done that before...

donaldRwilliams avatar Nov 16 '22 16:11 donaldRwilliams

Is this something I can do outside of BGGM? Or do I have to fork it and create my own version that does that?

Thanks, Jonas

jmbh avatar Nov 18 '22 13:11 jmbh

Pretty sure outside

Screenshot_20221118-074040_Chrome

donaldRwilliams avatar Nov 18 '22 15:11 donaldRwilliams

This is still an issue in the latest (2.1.1) version. As Donny noted, the suggested flags need to be added to the etc/Makevars files and the package has to be re-compiled with those flags. I'm hesitant to add those to the standard version, as increasing those from 32 to 64 bit might be beyond what some systems can handle in terms of memory allocation. We might give it a try in a development version and see if things break...

You could, as you noted, fork the project and add those flags and recompile it locally and hope that the 64bit solved the issue. You could also, if feasible, take a couple of random subsamples of smaller size (eg. 20,000 rows or so) and run a couple of models and aggregate those data?

ph-rast avatar Mar 22 '24 21:03 ph-rast