s-jSDM
s-jSDM copied to clipboard
sjSDM - memory issues of anova()
via Email:
I'm trying to use an anova on a sjSDM as shown on the github page
an = anova(model)
Important to note that I'm using a remote server, and the model was computed on a GPU. Here is the error I get :
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 59.78 GiB. GPU 0 has a total capacity of 44.35 GiB of which 43.83 GiB is free. Including non-PyTorch memory, this process has 526.00 MiB memory in use. Of the allocated memory 179.71 MiB is allocated by PyTorch, and 24.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
I opened the terminal and tried :
set 'PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True'
But it didn't change anything. The GPU has a 48GB size.
Hi,
The model tried to allocate 60GB, but your GPU has only 48GB. Possible solutions to reduce the memory consumption of your model
- Decrease the step_size
- Decrease sampling
What are the dimensions of your data? And could you please share your code? There may be another problem.
Best, Max
Hi, thanks for the quick answer, the data comprises 31,000 rows, 2,000 species, and I only included 3 variables to check how the models would run with a smaller set of variables. It ran in 7 minutes. Here is the code I used :
model <- sjSDM(Y = RLS_matrix,
env = linear(data = as.matrix(all_cov_selection),
formula = ~ Sand + Rock + seagrass),
spatial = linear(data = as.matrix(coords),
formula = ~ 0 + longitude:latitude),
se = F,
family = binomial("logit"),
sampling = 100L,
device = "gpu")
Thanks
step_size is automatically set to 10% of your data (which can consume a lot of memory for large data), so you might want to try it for your full data:
model <- sjSDM(Y = RLS_matrix,
env = linear(data = as.matrix(all_cov_selection),
formula = ~ .),
spatial = linear(data = as.matrix(coords),
formula = ~ 0 + longitude:latitude),
se = F,
step_size = 100L,
family = binomial("logit"),
sampling = 100L,
device = "gpu")
Thanks, it worked, no error at the end of the command, however when I ran
an = anova(model)
Although the progress bar gets to the end in 7 minutes, the process never ends and the 'an' object never appears in the environment. I had the same issue when I ran the model on my own machine (with device = "cpu"), the anova took hours to run, and in the end the object was not created... any clues ?
Hi @Loic-sanchez ,
the anova can be very slow, the default for the MC samples in the anova is 5000 (samples = 5000L), maybe rerun it with only 100-500 samples:
an = anova(model, samples = 500L)
Hello, the same thing happens: the anova runs quickly (~7 minutes), but the process never seems to end, and no object is created.
I ran the function with 500 and 100 samples