batchtools icon indicating copy to clipboard operation
batchtools copied to clipboard

[Pre-pull request] Adding jobs to an existing registry

Open cfhammill opened this issue 5 years ago • 6 comments

Frequently I run into the situation that I run set of jobs in parallel with batchMap only to realize that I forgot to include an interesting case in the input list at a later date. Historically I've either made a new registry (ugh) or deleted and re-run everything (more-ugh). Today I really didn't want to do either so I figured out how to add jobs to an existing registry.

Is this something you'd consider adding if I put together a PR? I suspect the answer is probably "you should be using the experiment abstraction", but I suspect enough people run in to this problem that it would be beneficial to add. I've included code at the bottom for doing it very roughly, in the case it's going to be a part of the package I'd write something like batchUpdateMap which assembles the new param list for the user.

Example code for doing it manually below for if anyone needs it in the mean-time:

reg <- loadRegistry(reg, writeable = TRUE)

previous_max_id <- max(reg$status$job.id)
new_id <- previous_max_id + 1
new_params <- list(some = pars) #get skeleton from reg$defs$job.pars[[1]]

#Add row to job definitions
reg$defs <- 
  rbind(reg$defs
      , data.table(def.id = new_id
                 , job.pars = list(list(new_params)))

setkey(reg$defs, "def.id") #reset data.table key

#Add row to status table
reg$status <-
  rbind(reg$status
      , data.table(job.id = new_id, def.id = new_id, submitted = NA_real_, 
                   started = NA_real_, done = NA_real_, error = NA_character_, 
                   mem.used = NA_real_, resource.id = NA_integer_, batch.id = NA_character_, 
                   log.file = NA_character_, job.hash = NA_character_, job.name = NA_character_, 
                   key = "job.id"))

setkey(reg$status, "job.id") #reset data.table key

saveRegistry(reg) #Save our updates

Obviously this can be generalized for adding more than one job.

cfhammill avatar Nov 09 '18 16:11 cfhammill

that's funny I was doing pretty much the same hack yesterday

+1 for adding jobs to an existing registry

tdhock avatar Nov 09 '18 16:11 tdhock

I can include something like this. How do you want the interface to look like? Re-running batchMap() or something like addJobs(params = list())?

mllg avatar Nov 12 '18 07:11 mllg

I'd be interested in something along the lines of re-running batchMap, but with a different name e.g. batchMapAddition.

Originally I was thinking that the function should be required to use the same function as the original batchMap, but maybe that constraint isn't particularly useful.

cfhammill avatar Nov 15 '18 14:11 cfhammill

Also, as mentioned in the title I'm happy to write it, but if you'd like more control over the implementation and want to write it yourself just let me know.

cfhammill avatar Nov 15 '18 14:11 cfhammill

my use case for adding jobs to an existing registry involves dependencies #204 between jobs that each have different functions, so it would be useful if each job could have its own function

tdhock avatar Nov 15 '18 15:11 tdhock

my use case for adding jobs to an existing registry involves dependencies #204 between jobs that each have different functions, so it would be useful if each job could have its own function

Lifting all restrictions is probably better than only allowing to add more jobs for the same function. However, this requires extensive refactoring and is not easy to implement in a backward compatible fashion. I can give it a shot, but I'm currently quite busy with other projects, so this will probably not get done before January. 😞

If one of you guys want to start a PR, here are the most important steps to consider:

  • In batchMap, the tuple of user function and more args must be stored using a unique file name (using their hash), and the hash must be stored in reg$defs.
  • batchMap just needs to append jobs. If you provide a different function or different more.args, it will automatically only be used for the new jobs. Adding the possibility to "patch" a function for already defined jobs can be added later.
  • JobCollections must store the hash to identify the function to load on the slave.
  • Job$fun() and Job$pars() must read from the new locations
  • The update routine has to adjust old registries to the new file system structure on first load

mllg avatar Nov 16 '18 22:11 mllg