doAzureParallel
doAzureParallel copied to clipboard
Warning message: In quit... system call failed: Cannot allocate memory
Some of my nodes are failing with this error:
Warning message: In quit(save = "yes", status = workerErrorStatus, runLast = FALSE) : system call failed: Cannot allocate memory
Does this mean I need a CPU with more memory?
Hi @ctlamb
Yes, this means you will need a CPU with more memory. I suggest measuring the memory usage for each task so you have a benchmark of what Azure VM to use.
Thanks Brian
Excellent, will do. In the meantime, I tried to use a machine with slightly more memory ("vmSize" = "Standard_E4_v3"),, but I am running into the following error after I run foreach (this error doesn't occur with "vmSize" = "Standard_DS12_v2")
##Error: No automatic parser available for 7b/.
What region are you in? It could be possible that Standard_E4_v3 is not available in your region. Is this happening during makeCluster?
I'm in West US. The error throws in foreach
It looks like my tasks only use a max of 8GB of RAM, so the 28GB of ram I had in the "Standard_DS12_v2" should've been plenty. hmmmm. Not sure what's going on here
Memory usage readout
Function_Call Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB
1 doAzureParallel::setCredentials(credentials) 0.005 0.0 0.0
2 mod<-mod.files$FilePath[bp$model[i]] 0.000 0.0 0.0
3 tile<-r.files$FilePath[bp$tile[i]] 0.000 0.0 0.0
4 doAzureParallel::getStorageFile(container="occmodels",blobPath=paste0(mod),downloadPath=paste0(mod),overwrite=TRUE) 49.721 190.6 190.6
5 brt<-readRDS(paste0(mod)) 10.233 665.8 665.8
6 doAzureParallel::getStorageFile(container="rastertiles",blobPath=paste0(tile),downloadPath=paste0(tile),overwrite=TRUE) 496.358 1996.0 1996.0
7 unzip(paste0(tile),exdir=here::here(),junkpaths=TRUE,overwrite=TRUE) 27.612 0.0 0.0
8 raster_data<-list.files(here::here(),pattern=".tif$",full.names=TRUE) 0.150 0.0 0.0
9 STACK<-raster::stack(raster_data) 2.337 0.3 6.0
10 STACK[["CutBlock_Occurrence"]]<-ratify(STACK[["CutBlock_Occurrence"]]) 5.092 0.0 1161.7
11 STACK[["Fire_Occ"]]<-ratify(STACK[["Fire_Occ"]]) 5.012 0.0 1161.7
12 STACK[["CRDP_LC"]]<-ratify(STACK[["CRDP_LC"]]) 5.132 0.0 1161.7
13 STACK[["MODIS_LC"]]<-ratify(STACK[["MODIS_LC"]]) 4.990 0.0 1161.7
14 pred<-dismo::predict(STACK,brt,n.trees=brt$gbm.call$best.trees,type="response") 22156.271 387.8 8056.5
15 return(pred) 0.000 0.0 0.0
Are you setting maxTasksPerNode greater than 1 in your cluster configuration?
No it's =1
clusterConfig <- list( "name" = "LambRaster", "vmSize" = "Standard_DS12_v2", "maxTasksPerNode" = 1, "poolSize" = list( "dedicatedNodes" = list( "min" = 1, "max" = 200 ), "lowPriorityNodes" = list( "min" = 0, "max" = 0 ), "autoscaleFormula" = "QUEUE" ), "containerImage" = "rocker/geospatial:latest", "rPackages" = list( "cran" = c("doParallel", "here", "dismo", "gbm", "snow"), "github" = c("Azure/doAzureParallel"), "bioconductor" = c() ), "commandLine" = list() )
Is there a better/preferred package I could use to measure the memory usage?
Now getting Error: No automatic parser available for 7b/.
even when I use the D12 machine now. ugghh, always hard to trouble shoot one issue (memory) when another pops up. Any thoughts? I could start a new thread if its easier
I don't have a preferred package for measuring memory usage. Where exactly is this error occuring? Is this when the foreach is getting results?
If you have a cluster configuration file and a reproducible sample, I will work on identifying the issue
This is the same as issue #315. I've spent many an hour pulling my hair out over this issue and I've no idea what's causing it. I've provided a lot of qualitative information in #315 but haven't had time to build a fully reproducible example at the scale which I think is generating the error.
@ctlamb is your workflow using resource files uploaded to Azure storage? My workflow is and I haven't been able to determine whether the 7b error still occurs when not using resource files. I'd like to attempt to rule out whether resource files could be contributing in some way.
Yes I am uploading and downloading data to Azure storage in my workflow. I do wonder if this was an internet issue? My internet speed was recently upgraded, and I haven't got the 7b error since..but thats only based on 5-10 different tries so far. Will update if anything changes
Yes I am uploading and downloading data to Azure storage in my workflow. I do wonder if this was an internet issue? My internet speed was recently upgraded, and I haven't got the 7b error since..but thats only based on 5-10 different tries so far. Will update if anything changes
Thanks for the extra information. My latest post at #315 documents the return of the dreaded 7b error.
I considered your idea here as well. However, my university network is a gigabit connection and it's rock stable. My home internet is a 100Mb fire connection which is also super reliable (for the most part).
I wonder if there's a limit to the number of connections Batch/HTTR can accept from a single IP address? I'm currently running 2 pools on my laptop (home network) and three on my uni workstation all day and they've been stable all day. If I try and run any more pools than this on either machine, the 7b error will return almost instantly. It's very strange...
Are all of your workflows in interactive mode? (Waiting for the job to be done)
Thanks, Brian
Mine is, yes.
Any news on the status of this error? It's still happening to me with frustrating regularity.
Thanks!