Error generating azbatchenv rds file

Open fermumen opened this issue 3 years ago • 3 comments

I got a bit a weird error trying to run some code in azure batch that was working correctly on regular doParallel. This is the job's stderr

running '/usr/local/lib/R/bin/R --no-echo --no-restore --no-save --no-environ --no-restore --no-site-file --file=/mnt/batch/tasks/workitems/job20210326153929/job-1/jobpreparation/wd/worker.R --args 10 10 0 pass' . Error in readRDS(paste0(batchJobPreparationDirectory, "/", batchJobEnvironment)) : error reading from connection Execution halted

I've downloaded the job.rds from Azure Blob Storage and indeed I can't read it on my computer either. How could I troubleshoot this?

I've tried the same code with just a subset of the data (~10%) and it seems to work correctly. Is there a limit on how much data can be uploaded to storage from doAzureParallel?

Hi @fermumen,

Does the foreach loop finish without any errors? Also are you using error handling option?

Thanks, Brian

Hi, all the jobs finish with errors but I think in the job preparation stage. I have tried filtering the dataframe to ~60% of the size with different random samples and it works as it should, it's only when I use the full dataset (~900k observations) that it fails. The code I'm running is a tune grid which implements %dopar%

cl <- make_azbatch_cluster("rf_pool3", cran_libraries = c("ranger", "tidymodels"),
                           CPU = 4, tasks_per_node = 1,
                           low_priority_nodes = list(min = 25,
                                                     max = 25))
esc_grid_results <- esc_workflow %>%
  tune_grid(resamples, # %dopar%
            grid = esc_grid,
            control = tune::control_grid(verbose = TRUE,
                                         parallel_over = "everything"))


Maybe I can try to generate a randomised example for you to reproduce.

