future
future copied to clipboard
BACKEND: ClusterMQ as a new backend
I have recently discovered ClusterMQ which can run R code in SLURM/LSF/etc. jobs. The biggest advantage over batchtools is it uses ZMQ to transfer data directly to the distributed jobs. In my experience the most serious bottleneck in batchtools is using shared file system (NFS) for data transfer - especially if the data is large.
Yes, @mschubert's ClusterMQ is a great candidate for a future backend. I don't have the resources myself right now to work also on that. Having said that, and without having worked with ClusterMQ myself, I don't think it should be too much work to wrap it all up in a ClusterMQFuture - a future backend is mostly a thin layer on top of an existing API.
Related: I'm working on setting up a conformation test suite (e.g. future.tests pkg) that can be used by all future backend pkgs to make sure they got it correct. That is my number one priority before working on new backends.
I fully support this, but unfortunately my time is also quite limited these days.
Given that clustermq::Q()
is synchronous, I am wondering what it would take to make an asynchronous ClusterMQFuture. Do we need local background processes to collect the results?
Will future.clustermq
somehow allow for heterogeneous transient workers? Some drake
users such as @jennysjaarda prefer transient future
-based workers over persistent clustermq
-based workers, e.g. https://github.com/ropensci/drake/issues/1083#issuecomment-564941327, but there is still the snag that batchtools
is slower than clustermq
.
Yes, this would be great if it somehow clustermq
could allow for transient workers!
May I ask what the status of the backend is? Is it still planned to include clustermq as a backend to future or is there already a way to get that functionality via some workaround.
clustermq is quite a bit more efficient as pointed out in this thread and is thus very interesting for cluster usage.
Still on my wishlist to get to, so, yes, certainly on the todo list. Resources/time is the limiting factor. Indirectly, a big step forward has actually been made since automatic validation of new backends is now in place, cf. future.tests.
PS. I invite anyone to have a look at the very rudimentary first prototype future.clustermq and see if they can give it a push forward (PRs welcome).
@HenrikBengtsson When I wanted to check out the future.clustermq
link I got a 404.
Is this still set to private?
It would be awesome to get this going. I just made it public, but please note that it's very rudimentary/prototypical and I have not touched it for a very long time.