batchtools
batchtools copied to clipboard
[Pre-pull request] Adding jobs to an existing registry
Frequently I run into the situation that I run set of jobs in parallel with batchMap
only to realize that I forgot to include an interesting case in the input list at a later date. Historically I've either made a new registry (ugh) or deleted and re-run everything (more-ugh). Today I really didn't want to do either so I figured out how to add jobs to an existing registry.
Is this something you'd consider adding if I put together a PR? I suspect the answer is probably "you should be using the experiment abstraction", but I suspect enough people run in to this problem that it would be beneficial to add. I've included code at the bottom for doing it very roughly, in the case it's going to be a part of the package I'd write something like batchUpdateMap
which assembles the new param list for the user.
Example code for doing it manually below for if anyone needs it in the mean-time:
reg <- loadRegistry(reg, writeable = TRUE)
previous_max_id <- max(reg$status$job.id)
new_id <- previous_max_id + 1
new_params <- list(some = pars) #get skeleton from reg$defs$job.pars[[1]]
#Add row to job definitions
reg$defs <-
rbind(reg$defs
, data.table(def.id = new_id
, job.pars = list(list(new_params)))
setkey(reg$defs, "def.id") #reset data.table key
#Add row to status table
reg$status <-
rbind(reg$status
, data.table(job.id = new_id, def.id = new_id, submitted = NA_real_,
started = NA_real_, done = NA_real_, error = NA_character_,
mem.used = NA_real_, resource.id = NA_integer_, batch.id = NA_character_,
log.file = NA_character_, job.hash = NA_character_, job.name = NA_character_,
key = "job.id"))
setkey(reg$status, "job.id") #reset data.table key
saveRegistry(reg) #Save our updates
Obviously this can be generalized for adding more than one job.
that's funny I was doing pretty much the same hack yesterday
+1 for adding jobs to an existing registry
I can include something like this. How do you want the interface to look like? Re-running batchMap()
or something like addJobs(params = list())
?
I'd be interested in something along the lines of re-running batchMap
, but with a different name e.g. batchMapAddition
.
Originally I was thinking that the function should be required to use the same function as the original batchMap
, but maybe that constraint isn't particularly useful.
Also, as mentioned in the title I'm happy to write it, but if you'd like more control over the implementation and want to write it yourself just let me know.
my use case for adding jobs to an existing registry involves dependencies #204 between jobs that each have different functions, so it would be useful if each job could have its own function
my use case for adding jobs to an existing registry involves dependencies #204 between jobs that each have different functions, so it would be useful if each job could have its own function
Lifting all restrictions is probably better than only allowing to add more jobs for the same function. However, this requires extensive refactoring and is not easy to implement in a backward compatible fashion. I can give it a shot, but I'm currently quite busy with other projects, so this will probably not get done before January. 😞
If one of you guys want to start a PR, here are the most important steps to consider:
- In
batchMap
, the tuple of user function and more args must be stored using a unique file name (using their hash), and the hash must be stored inreg$defs
. -
batchMap
just needs to append jobs. If you provide a different function or different more.args, it will automatically only be used for the new jobs. Adding the possibility to "patch" a function for already defined jobs can be added later. - JobCollections must store the hash to identify the function to load on the slave.
-
Job$fun()
andJob$pars()
must read from the new locations - The update routine has to adjust old registries to the new file system structure on first load