vscode-remote-release icon indicating copy to clipboard operation
vscode-remote-release copied to clipboard

Cluster workflow feature to allow shell commands or script to run before remote server setup (e.g. slurm) (wrap install script)

Open wwarriner opened this issue 6 years ago • 117 comments

I want to be able to connect to our institution's cluster using VS Code Remote SSH without the server running on a compute node instead of the login node. The preferred workflow is to SSH into the login node and then use a command to allocate a job and spin up an interactive shell on a compute node, and then run any further tasks from there. VS Code Remote SSH doesn't appear to have a feature that facilitates this workflow. I want to be able to inject the spin-up command immediately after SSH'ing into the cluster, but before the VS Code server is set up on the cluster, and before any other tasks are run.

wwarriner avatar Oct 24 '19 18:10 wwarriner

I managed to modify the extension.js file in the following way:

CTRL+F -> "bash Change the string literal "bash" to "bash -c \"MY_COMMAND bash\""

I've confirmed that this correctly starts the VS Code Remote SSH server on a compute node. Now I am running into a port-forwarding issue, possibly related to issue #92. Our compute nodes have the ports used by VS Code Remote SSH disabled, so there isn't an easy way around this issue.

Thanks for the hard work on this so far! This extension has extraordinary potential. Being able to run and modify a Jupyter notebook remotely on our cluster, while using intellisense and gitlens, AND conda environment detection and dynamic swapping, all in a single application for FREE is incredible.

wwarriner avatar Oct 24 '19 22:10 wwarriner

Our compute nodes have the ports used by VS Code Remote SSH disabled, so there isn't an easy way around this issue.

Do you mean that port forwarding for ssh is disabled on that server? Or are you able to forward some other port over an ssh connection to that server?

roblourens avatar Oct 27 '19 22:10 roblourens

Port forwarding for SSH is not disabled on any part of our cluster. I am not intentionally attempting to forward any other ports to the server. I was using remote.SSH.enableDynamicForwarding and remote.SSH.useLocalServer. Your questions gave me the idea to disable those options. I can't determine if that has helped because my earlier assertion was incorrect. I can't get the server to run on a compute node.

To address that issue, and to clarify our workflow some, we are using Slurm. It is highly preferred to have tasks running within a job context so that login node resources aren't being consumed. To do that, we create a job using srun (or one of its siblings) with appropriate resource request parameters. Any commands we want to run are provided as the final argument to srun. All calls to srun must have a command, because it uses execve() to invoke the commands apparently. If no command is passed, srun fails with an error message. With that in mind, setting up the VS Code server on the remote would have to be funneled through a call to srun. Any other method of invocation (such as bash -c) will result in commands being run out of the job context, and thus on the login node. Naively modifying the bash invocation does not work, apparently because srun never receives any arguments. It isn't clear to me how the server installer gets invoked and set up, so I can't offer any suggestions.

As a side note, it is also possible to provide the argument --pty bash to srun to get a terminal within the job context on a node allocated for that job. Looking at #1671, specifically here. It seems like it should be possible to adjust the invocation of bash -ilc to do additional things (found by ctrl+f). I've tried testing this but it doesn't look like that code is called at any point that I can tell, using echo for debugging.

wwarriner avatar Oct 28 '19 23:10 wwarriner

What code do you mean by "that code"? I don't think the issue you point to is related.

We run the installer script essentially like echo <installer script here> | ssh hostname bash. There is an old feature request to be able to run a custom script before running the installer. I am not sure whether that would help you here, is there a way with Slurm to run a command, then have the rest of the same script run in a job context?

It sounds more like you need a way to wrap the full installer script in a custom command, like srun "<installer script here>" is that right?

roblourens avatar Oct 28 '19 23:10 roblourens

Yes to your last question, ideally with the ability to customize the wrapping command.

wwarriner avatar Oct 28 '19 23:10 wwarriner

This would be a important feature for vscode-remote. I am currently trying to use vscode to run some interactive python code in a shared cluster and the only way of doing it is by using the srun command of slurm. I'll try to find a workaround, but I think there really is a user case for this feature request.

nicocarbone avatar Nov 06 '19 14:11 nicocarbone

I've got the same issue, but with using LSF instead of SLURM. As @roblourens points out here: https://github.com/microsoft/vscode-remote-release/issues/1829#issuecomment-553525298 just running the install script and starting the server only solves half the problem. Once the server is started, I surmise that VS code will still try sshing directly into the desired (login-restricted) machine to discover what port the VS-remote server picked, as well as starting new terminals that show up in the GUI.

Basically, the only way this can work is if all subprocesses for servers and user terminals are strictly forked children from the original seed shell acquired from LSF/SLURM/whatever job manager you are using. A hacky workaround may be to use something like Paramiko to start a mini-SSH server from the seed shell and then login to this mini server directly from VS Code (assuming there isn't a firewall blocking you, but obviously reverse SSH tunnels can be used to get around that).

daferna avatar Nov 13 '19 23:11 daferna

Another possible resolution to this issue is by enabling a direct connection to the remote server. That is, the user would:

  1. Launch vscode-server on a remote (possible login-restricted) host.
  2. Enter the remote server address and port in vscode, and connect to it.

That way, no ssh is required at all and it can work on login-restricted hosts.

benfei avatar Dec 22 '19 07:12 benfei

A slight variant on this: I would like to be able to get the target address for SSH from a script (think cat'ing a file that is semi-frequently updated with the address of a dynamic resource). Currently I am using a ProxyCommand configured in sshconfig, but that has the disadvantage of requiring a second process.

ihnorton avatar Jan 13 '20 21:01 ihnorton

I want to be able to connect to our institution's cluster using VS Code Remote SSH without the server running on a compute node instead of the login node. The preferred workflow is to SSH into the login node and then use a command to allocate a job and spin up an interactive shell on a compute node, and then run any further tasks from there. VS Code Remote SSH doesn't appear to have a feature that facilitates this workflow. I want to be able to inject the spin-up command immediately after SSH'ing into the cluster, but before the VS Code server is set up on the cluster, and before any other tasks are run.

@wwarriner Is the issue you are referring to the same one as the one on this stack overflow SO question?

It sounds like we are having a similar problem, when I spin an interactive job and try to run my debugger, I can't do it because it goes back to the head node and tries to run things there.

https://stackoverflow.com/questions/60141905/how-to-run-code-in-a-debugging-session-from-vs-code-on-a-remote-using-an-interac

brando90 avatar Feb 09 '20 22:02 brando90

The problem is more serious than I thought. I can't run the debugger in the interactive session but I can't even "Run Without Debugging" without it switching to the Python Debug Console on it's own. So that means I have to run things manually with python main.py but that won't allow me to use the variable pane...which is a big loss! (I was already willing to lose the breakpoint privilege by using pdb, which I wasn't a super big fan but ok fine while things get fixed...)

What I am doing is switching my terminal to the conoder_ssh_to_job and then clicking the button Run Without Debugging (or ^F5 or Control + fn + f5) and although I made sure to be on the interactive session at the bottom in my integrated window it goes by itself to the Python Debugger window/pane which is not connected to the interactive session I requested from my cluster...

brando90 avatar Feb 10 '20 16:02 brando90

Am I reading this right that currently the only way to have the language server run a compute node rather than the head/login node is to modify extension.js? Or is there a different preferred solution? I'm also getting weird port conflicts when I modify extension.js.

(I'm also using slurm and the python language server eating up 300GB on the head node disrupts the whole department).

daeh avatar Feb 20 '20 06:02 daeh

I'm curious if this is on the roadmap for the near future. With my university going entirely remote for the foreseeable future, being able to use this extension to work on the cluster would be absolutely amazing.

daeh avatar Mar 19 '20 23:03 daeh

Yes, I also want this feature a lot with universities going remote due to COVID-19

brando90 avatar Mar 24 '20 22:03 brando90

Another possible resolution to this issue is by enabling a direct connection to the remote server. That is, the user would:

  1. Launch vscode-server on a remote (possible login-restricted) host.
  2. Enter the remote server address and port in vscode, and connect to it.

That way, no ssh is required at all and it can work on login-restricted hosts.

how do you do that? Have you tried it?

brando90 avatar Mar 24 '20 22:03 brando90

No capacity to address this in the near future but I am interested to hear how the cluster setup works for other users - if anyone is not using slurm/srun as described above please let me know what it would take to make this work for you.

roblourens avatar Mar 25 '20 00:03 roblourens

I put this to settings.json:

"terminal.integrated.shellArgs.linux": [
    "-c",
    "export FAF=FEF ; exec $SHELL -l",
  ]

image

After that every linux shell will has "FAF" env variable ( what I wanted ), furthermore with "exec" command , no new process created !

I hope this will be useful for someone :D !

alfonzso avatar Apr 08 '20 21:04 alfonzso

I guess this is related. I would like VS code clients (e.g., julia client) to have an option to start in the Slurm job I am currently in and not in the login node.

Nosferican avatar Apr 16 '20 14:04 Nosferican

I am able to get the Julia language server by having added

ml >/dev/null 2>&1 && ml julia

to my ~/.bashrc.

For Slurm jobs, I have to

  1. Start the Julia client.
  2. From the Julia client run the ijob command line
  3. Start Julia again from that shell.

Would be great to at least start the Julia client from the job shell as a Julia client session.

One issue with that approach is that it starts Julia from the shell and not the client so it misses out on a few features such as vscodeddisplay for being able to display tabular data.

Nosferican avatar May 14 '20 11:05 Nosferican

I tried to work on this for over a day, I may have got a somewhat working solution, inspired by @Nosferican's idea, to run the command line job from within the julia client. But I didn't have to add anything to my ~/.bashrc for it to work.

One caveat though, is like he said, I couldn't view dataframes using vscodedisplay function, nor am I able to view plots. But I suppose one hacky workaround for plots, is to save them and open them up alongside in vscode itself. Screenshow below shows how it worked: image

This was using julia, but I'm sure similar setup could be followed through python/R, i.e. by invoking shell command features and running srun from within julia/python/r, like this:

srun -c --pty julia

As @Nosferican said though, and as shown in my screenshot, images couldn't be displayed. Any ideas?

P.S. BTW before trying out this, I've tried all sorts of ways to get around this today, for e.g. by adding this to my settings.json:

"terminal.integrated.shellArgs.linux": [
    "-c",
    "srun -c 6 --pty bash",
]

Also tried to work around by using a tmux setup that would be running on a compute node, hoping any new julia/python/r instance would also be using the same instance. The tmux setup would be something like his: https://github.com/julia-vscode/julia-vscode/issues/426

But using that method, I could only get python to execute code on the terminal, doesn't even work for its interactive jupyter view, and not for r and julia.

Don't know enough about vscode's integrated terminal setup to manipulate the ports either.

srgk26 avatar May 17 '20 19:05 srgk26

Another use case would be to transfer code to server while ssh in to it. run the command rsync (automatically) ... on local server before opening the connection

ahmednrana avatar May 21 '20 10:05 ahmednrana

Any chance of getting this out of backlog and into a milestone, @roblourens? It would be amazing to be able to use this extension.

daeh avatar Jun 17 '20 18:06 daeh

Similar problem, I tried to request an interactive shell on my cluster via my login node. Unfortunately this causes the vs code extension to timeout.

I'm using LSF and I run: bsub -Is "zsh" in my .bashrc.

I'm guessing it's a port forwarding problem between the client server-hosted extension files?

ctr26 avatar Jun 17 '20 21:06 ctr26

Has there been any progress on this? Can we now ssh directly to an interactive session and have it work? (https://stackoverflow.com/questions/60141905/how-to-run-code-in-a-debugging-session-from-vs-code-on-a-remote-using-an-interac)

brando90 avatar Jun 30 '20 16:06 brando90

@roblourens There seems to be about 37 non-bugs in the backlog milestone. Could you give a rough estimate on how high this issue ranks in terms of priorities? For example, next release not the one after of end of the year?

Nosferican avatar Jun 30 '20 16:06 Nosferican

Has there been any progress on this? Can we now ssh directly to an interactive session and have it work? (https://stackoverflow.com/questions/60141905/how-to-run-code-in-a-debugging-session-from-vs-code-on-a-remote-using-an-interac)

Update: There's a huge problem with this approach. Please see the discussion below by @Nosferican.


I confirm the answer in the StackOverflow works for me! Thank you and the author of the answer!

Although I found there might need to be a bit modification from the original answer, mainly I think we need to add username@ before login server name (sorry I'm not able to comment there since my StackOverflow account is new).

A recap of the procedure:

  1. Submit an interactive job (e.g. salloc for slurm), get the computing node assigned.
  2. On VSCode, add remote SSH using ssh -J [email protected] username@nodeXXX. The -J option will set the "ProxyJump" in the ~/.ssh/config file, and it will look like:
Host MyCluster
    HostName nodeXXX
    ProxyJump [email protected]
    User username
  1. The setup is ready, you can open this SSH target in VSCode. You might need to enter your password twice for first logging into the login node, and then the computing node. Now you should be able to work on the remote computing node!!

A reminder: the key is to set the ~/.ssh/config correctly, be aware of the jump node's name. And remember to change the nodeXXX name every time to the computing node assigned.

p.s. It originally didn't work on one of my clusters somehow. But after I use the SSH key file, and specify the IdentityFile in the ~/.ssh/config, the problem was solved. So, I suggest using the SSH key and set the ~/.ssh/config as:

Host MyCluster
    HostName nodeXXX
    ProxyJump [email protected]
    User username
    IdentityFile ~\.ssh\my_key

This saves you from entering your password twice every time anyway.

Lucecpkn avatar Aug 02 '20 22:08 Lucecpkn

I tried the solution. I am able to start VS code on the computing node but it returns a shell in the computing node on in the Slurm job. Is there a way to have VS Code shell / language servers step into the job?

Nosferican avatar Aug 04 '20 04:08 Nosferican

I tried the solution. I am able to start VS code on the computing node but it returns a shell in the computing node on in the Slurm job. Is there a way to have VS Code shell / language servers step into the job?

Sorry, I'm not sure what you mean. Did you try to open the Explorer in VSCode and work on some code scripts? I think the language server will step in automatically when you work on a certain code script.

Lucecpkn avatar Aug 05 '20 15:08 Lucecpkn

Aye. The solution works in the sense I can connect to the compute nodes but I am not inside the Slurm job so I don't have access to the resources it allocated for it. I can start coding and the language server steps in but I am now consuming resources on that node that would not be tracked by the cluster job manager through Slurm.

Nosferican avatar Aug 05 '20 15:08 Nosferican

The solution isn't super practical for me as the nodes get allocated with arbitrary names

Ideally this proxy jumping would be automated

ctr26 avatar Aug 05 '20 15:08 ctr26