batchtools icon indicating copy to clipboard operation
batchtools copied to clipboard

QUESTION: is SSH + BSUB (LSF) available?

Open gabora opened this issue 7 years ago • 13 comments

Hi, I am curious if I could use batchtools on my local machine and submit jobs through SSH (using BSUB with LSF queueing system) on a remote cluster?

This would be a combination of clusterFunctionLSF and clusterFunctionSSH, but I haven't found such thing implemented yet, right?

I really would like to work on my local machine (in Rstudio) and do the computation on the cluster. ClusterFunctionSSH is not a solution, because I am not allowed to do computation/memory heavy tasks on the front end (as far as I understood clusterFunctionsSSH together with the worker does not use the queue system, but I might missed something here. ).

Looking forward to your suggestion, Thanks and Kind Regards, Attila

gabora avatar Apr 04 '17 09:04 gabora

This is not (yet) possible. Sending commands to a remote system is no problem, runOSCommand already supports that. The IO is a bigger problem. Every file must be copied to remote, or you must rely on something like sshfs. But even with sshfs, you would need to translate local paths to remote paths and vice versa. This is usually pretty error prone. I'll keep this open and probably implement it in a future version.

mllg avatar Apr 21 '17 09:04 mllg

Short comment regarding sshfs: the few times I used this to ensure a shared fs when there was none I was surprised how well this worked. I had no problems, just saying this if somebody needa a workaround.

berndbischl avatar Apr 21 '17 09:04 berndbischl

@gabora: is the main reason you want to do this because of the editor? R studio?

berndbischl avatar Apr 21 '17 09:04 berndbischl

Because years ago I was in the same situation before I began using vim. For me the main hurdle was to avoid manually copying files from the local machine to the server.

So when you use sshfs here, in a different way, to mount your server project code dir on your local machine, not many problems remain

  • you can edit your r files locally in rstudio. Convenient.
  • OK you have to do very few operations in a remote shell. Basically submitting jobs and watching theit status. That's really not so bad.
  • not much data is copied as you just mount the code dir

Does this help? I like using vim on the server more now. But this approach was really totally OK for me some time ago.

berndbischl avatar Apr 21 '17 09:04 berndbischl

Further side note: Getting reliable exit codes and output for remote commands seems to be possible with the subprocess package. Unfortunately, C++11 is a requirement.

mllg avatar Apr 21 '17 10:04 mllg

Note that I implemented something like that on a ZeroMQ-backend (clustermq) back in the day when I was frustrated with BatchJobs not being able to handle my number of jobs on a shared file system (due to SQLite locking).

The main difference to batchtools is that it doesn't store anything on network-mounted storage and does load balancing, but implementation is a lot more naive. Nevertheless, it works well for me (and a couple of other people, too).

It also supports sending remote jobs via SSH (so first SSH, then the job submission system). Downside is that it relies on the SSH forwarding not getting disconnected while the jobs run, which is good enough for my purposes but may not be for everyone.

mschubert avatar Apr 24 '17 13:04 mschubert

Thanks a lot for the reply @mllg @berndbischl and @mschubert .

I used @mschubert 's clustermq, which works really well, but we had problem with long runs.

@mllg thanks for considering for possible implementation! @berndbischl : I was looking for this options because

  • I like R studio and to see the figures immediately,
  • we are not allowed to do computationally heavy stuff in an interactive sessions on the cluster.
  • i don't like writing batch code

I will try sshfs , thanks a lot for the suggestion!

gabora avatar Apr 24 '17 13:04 gabora

Sorry if this is a naive questions, but I'm still new to batchtools: how would I set up clusterFunctionSGE and clusterFunctionSSH for batching qsub jobs on a remote machine? Here, the same file system is shared across machines, so file copying shouldn't be an issue.

Background: I'm running rstudio-server on a linux VM that running on our department server. We have a large SGE compute cluster available, but the VM running rstudio-server is not a submit host. So, in order to submit jobs to the cluster via qsub, we have to ssh to another machine (or a submit host VM) and run qsub from that machine.

nick-youngblut avatar Apr 09 '18 19:04 nick-youngblut

This was already implemented for Slurm, and now there is a prototype for SGE. It is kind of buggy though, getting the quoting right is not straight forward.

mllg avatar Apr 11 '18 09:04 mllg

NB: LSF's bsub needs the template provided via STDIN instead of a file. I currently see no way to accomplish this in a reliable way with SSH.

mllg avatar Apr 11 '18 09:04 mllg

I'm not sure exactly what you mean by "getting the quoting right". Are there any guidelines/docs to help users avoid problems?

nick-youngblut avatar Apr 11 '18 09:04 nick-youngblut

I'm not sure exactly what you mean by "getting the quoting right".

This was just a comment on the implementation ...

Are there any guidelines/docs to help users avoid problems?

Not yet. I still have to write it down in full detail, but here are the required steps in a TLDR style:

  1. Ensure that you have passwordless SSH (pubkey auth) to the remote submit host
  2. Mount a directory on the remote filesystem (e.g. via sshfs): sshfs user@remote_cluster:/home/user/experiments /home/user/experiments (you need to create them first). You need to have the same filesystem layout on both client and remote, relative to your home directory. ~ will not be expanded so that you can have different login names and symlinks will not be resolved on the client.
  3. On the client, start batchtools with the configuration for the cluster site, e.g. with makeClusterFunctionsSGE("template-file-on-client", nodename = "remote_cluster")
  4. Create a registry with file dir pointing to a subdirectory of the mount makeRegistry(file.dir = "~/experiments/reg")

mllg avatar Apr 11 '18 12:04 mllg

I'm quite close to making this work, but it seems like there is a path being expanded in the slurm SSH submission still. I get the following error:

submitJobs(ids,
+            resources = list(walltime="00:30:00",
+                             memory = "4gb",
+                             ncpus=1))
Submitting 20 jobs in 3 chunks using cluster functions 'Slurm' ...
Error: Fatal error occurred: 101. Command 'sbatch' produced exit code 1. Output: 'sbatch: error: Unable to open file /home/rob/princescratch/apml/test/jobs/jobd7fe014f580d5d26c586d429b4ba7a10.job'

where /home/rob/ is the home directory on the local machine where I'm submitting from.

The setup is as follows:

reg <- makeExperimentRegistry(file.dir = "~/princescratch/apml/test",
                              packages=c("data.table","foreach",
                                         "doMC","findSplit"),
                              conf.file = NA)

cf <- makeClusterFunctionsSlurm(template = "~/princescratch/apml/slurm-prince.tmpl",
                                nodename = "rjr@mycluster",
                                array.jobs = TRUE)

reg$cluster.functions <- cf

Any suggestions on where to look? The template seems to be generating a reasonable job file with a relative path for the Rscript call. Thanks.

rrichmond avatar Jul 23 '19 00:07 rrichmond