vscode-remote-release
vscode-remote-release copied to clipboard
Cluster workflow feature to allow shell commands or script to run before remote server setup (e.g. slurm) (wrap install script)
I want to be able to connect to our institution's cluster using VS Code Remote SSH without the server running on a compute node instead of the login node. The preferred workflow is to SSH into the login node and then use a command to allocate a job and spin up an interactive shell on a compute node, and then run any further tasks from there. VS Code Remote SSH doesn't appear to have a feature that facilitates this workflow. I want to be able to inject the spin-up command immediately after SSH'ing into the cluster, but before the VS Code server is set up on the cluster, and before any other tasks are run.
I managed to modify the extension.js file in the following way:
CTRL+F -> "bash
Change the string literal "bash" to "bash -c \"MY_COMMAND bash\""
I've confirmed that this correctly starts the VS Code Remote SSH server on a compute node. Now I am running into a port-forwarding issue, possibly related to issue #92. Our compute nodes have the ports used by VS Code Remote SSH disabled, so there isn't an easy way around this issue.
Thanks for the hard work on this so far! This extension has extraordinary potential. Being able to run and modify a Jupyter notebook remotely on our cluster, while using intellisense and gitlens, AND conda environment detection and dynamic swapping, all in a single application for FREE is incredible.
Our compute nodes have the ports used by VS Code Remote SSH disabled, so there isn't an easy way around this issue.
Do you mean that port forwarding for ssh is disabled on that server? Or are you able to forward some other port over an ssh connection to that server?
Port forwarding for SSH is not disabled on any part of our cluster. I am not intentionally attempting to forward any other ports to the server. I was using remote.SSH.enableDynamicForwarding and remote.SSH.useLocalServer. Your questions gave me the idea to disable those options. I can't determine if that has helped because my earlier assertion was incorrect. I can't get the server to run on a compute node.
To address that issue, and to clarify our workflow some, we are using Slurm. It is highly preferred to have tasks running within a job context so that login node resources aren't being consumed. To do that, we create a job using srun (or one of its siblings) with appropriate resource request parameters. Any commands we want to run are provided as the final argument to srun. All calls to srun must have a command, because it uses execve() to invoke the commands apparently. If no command is passed, srun fails with an error message. With that in mind, setting up the VS Code server on the remote would have to be funneled through a call to srun. Any other method of invocation (such as bash -c) will result in commands being run out of the job context, and thus on the login node. Naively modifying the bash invocation does not work, apparently because srun never receives any arguments. It isn't clear to me how the server installer gets invoked and set up, so I can't offer any suggestions.
As a side note, it is also possible to provide the argument --pty bash to srun to get a terminal within the job context on a node allocated for that job. Looking at #1671, specifically here. It seems like it should be possible to adjust the invocation of bash -ilc to do additional things (found by ctrl+f). I've tried testing this but it doesn't look like that code is called at any point that I can tell, using echo for debugging.
What code do you mean by "that code"? I don't think the issue you point to is related.
We run the installer script essentially like echo <installer script here> | ssh hostname bash. There is an old feature request to be able to run a custom script before running the installer. I am not sure whether that would help you here, is there a way with Slurm to run a command, then have the rest of the same script run in a job context?
It sounds more like you need a way to wrap the full installer script in a custom command, like srun "<installer script here>" is that right?
Yes to your last question, ideally with the ability to customize the wrapping command.
This would be a important feature for vscode-remote. I am currently trying to use vscode to run some interactive python code in a shared cluster and the only way of doing it is by using the srun command of slurm. I'll try to find a workaround, but I think there really is a user case for this feature request.
I've got the same issue, but with using LSF instead of SLURM. As @roblourens points out here: https://github.com/microsoft/vscode-remote-release/issues/1829#issuecomment-553525298 just running the install script and starting the server only solves half the problem. Once the server is started, I surmise that VS code will still try sshing directly into the desired (login-restricted) machine to discover what port the VS-remote server picked, as well as starting new terminals that show up in the GUI.
Basically, the only way this can work is if all subprocesses for servers and user terminals are strictly forked children from the original seed shell acquired from LSF/SLURM/whatever job manager you are using. A hacky workaround may be to use something like Paramiko to start a mini-SSH server from the seed shell and then login to this mini server directly from VS Code (assuming there isn't a firewall blocking you, but obviously reverse SSH tunnels can be used to get around that).
Another possible resolution to this issue is by enabling a direct connection to the remote server. That is, the user would:
- Launch vscode-server on a remote (possible login-restricted) host.
- Enter the remote server address and port in vscode, and connect to it.
That way, no ssh is required at all and it can work on login-restricted hosts.
A slight variant on this: I would like to be able to get the target address for SSH from a script (think cat'ing a file that is semi-frequently updated with the address of a dynamic resource). Currently I am using a ProxyCommand configured in sshconfig, but that has the disadvantage of requiring a second process.
I want to be able to connect to our institution's cluster using VS Code Remote SSH without the server running on a compute node instead of the login node. The preferred workflow is to SSH into the login node and then use a command to allocate a job and spin up an interactive shell on a compute node, and then run any further tasks from there. VS Code Remote SSH doesn't appear to have a feature that facilitates this workflow. I want to be able to inject the spin-up command immediately after SSH'ing into the cluster, but before the VS Code server is set up on the cluster, and before any other tasks are run.
@wwarriner Is the issue you are referring to the same one as the one on this stack overflow SO question?
It sounds like we are having a similar problem, when I spin an interactive job and try to run my debugger, I can't do it because it goes back to the head node and tries to run things there.
https://stackoverflow.com/questions/60141905/how-to-run-code-in-a-debugging-session-from-vs-code-on-a-remote-using-an-interac
The problem is more serious than I thought. I can't run the debugger in the interactive session but I can't even "Run Without Debugging" without it switching to the Python Debug Console on it's own. So that means I have to run things manually with python main.py but that won't allow me to use the variable pane...which is a big loss! (I was already willing to lose the breakpoint privilege by using pdb, which I wasn't a super big fan but ok fine while things get fixed...)
What I am doing is switching my terminal to the conoder_ssh_to_job and then clicking the button Run Without Debugging (or ^F5 or Control + fn + f5) and although I made sure to be on the interactive session at the bottom in my integrated window it goes by itself to the Python Debugger window/pane which is not connected to the interactive session I requested from my cluster...
Am I reading this right that currently the only way to have the language server run a compute node rather than the head/login node is to modify extension.js? Or is there a different preferred solution? I'm also getting weird port conflicts when I modify extension.js.
(I'm also using slurm and the python language server eating up 300GB on the head node disrupts the whole department).
I'm curious if this is on the roadmap for the near future. With my university going entirely remote for the foreseeable future, being able to use this extension to work on the cluster would be absolutely amazing.
Yes, I also want this feature a lot with universities going remote due to COVID-19
Another possible resolution to this issue is by enabling a direct connection to the remote server. That is, the user would:
- Launch vscode-server on a remote (possible login-restricted) host.
- Enter the remote server address and port in vscode, and connect to it.
That way, no ssh is required at all and it can work on login-restricted hosts.
how do you do that? Have you tried it?
No capacity to address this in the near future but I am interested to hear how the cluster setup works for other users - if anyone is not using slurm/srun as described above please let me know what it would take to make this work for you.
I put this to settings.json:
"terminal.integrated.shellArgs.linux": [
"-c",
"export FAF=FEF ; exec $SHELL -l",
]

After that every linux shell will has "FAF" env variable ( what I wanted ), furthermore with "exec" command , no new process created !
I hope this will be useful for someone :D !
I guess this is related. I would like VS code clients (e.g., julia client) to have an option to start in the Slurm job I am currently in and not in the login node.
I am able to get the Julia language server by having added
ml >/dev/null 2>&1 && ml julia
to my ~/.bashrc.
For Slurm jobs, I have to
- Start the Julia client.
- From the Julia client run the
ijobcommand line - Start Julia again from that shell.
Would be great to at least start the Julia client from the job shell as a Julia client session.
One issue with that approach is that it starts Julia from the shell and not the client so it misses out on a few features such as vscodeddisplay for being able to display tabular data.
I tried to work on this for over a day, I may have got a somewhat working solution, inspired by @Nosferican's idea, to run the command line job from within the julia client. But I didn't have to add anything to my ~/.bashrc for it to work.
One caveat though, is like he said, I couldn't view dataframes using vscodedisplay function, nor am I able to view plots. But I suppose one hacky workaround for plots, is to save them and open them up alongside in vscode itself. Screenshow below shows how it worked:

This was using julia, but I'm sure similar setup could be followed through python/R, i.e. by invoking shell command features and running srun from within julia/python/r, like this:
srun -c --pty julia
As @Nosferican said though, and as shown in my screenshot, images couldn't be displayed. Any ideas?
P.S. BTW before trying out this, I've tried all sorts of ways to get around this today, for e.g. by adding this to my settings.json:
"terminal.integrated.shellArgs.linux": [
"-c",
"srun -c 6 --pty bash",
]
Also tried to work around by using a tmux setup that would be running on a compute node, hoping any new julia/python/r instance would also be using the same instance. The tmux setup would be something like his: https://github.com/julia-vscode/julia-vscode/issues/426
But using that method, I could only get python to execute code on the terminal, doesn't even work for its interactive jupyter view, and not for r and julia.
Don't know enough about vscode's integrated terminal setup to manipulate the ports either.
Another use case would be to transfer code to server while ssh in to it. run the command rsync (automatically) ... on local server before opening the connection
Any chance of getting this out of backlog and into a milestone, @roblourens? It would be amazing to be able to use this extension.
Similar problem, I tried to request an interactive shell on my cluster via my login node. Unfortunately this causes the vs code extension to timeout.
I'm using LSF and I run:
bsub -Is "zsh"
in my .bashrc.
I'm guessing it's a port forwarding problem between the client server-hosted extension files?
Has there been any progress on this? Can we now ssh directly to an interactive session and have it work? (https://stackoverflow.com/questions/60141905/how-to-run-code-in-a-debugging-session-from-vs-code-on-a-remote-using-an-interac)
@roblourens There seems to be about 37 non-bugs in the backlog milestone. Could you give a rough estimate on how high this issue ranks in terms of priorities? For example, next release not the one after of end of the year?
Has there been any progress on this? Can we now ssh directly to an interactive session and have it work? (https://stackoverflow.com/questions/60141905/how-to-run-code-in-a-debugging-session-from-vs-code-on-a-remote-using-an-interac)
Update: There's a huge problem with this approach. Please see the discussion below by @Nosferican.
I confirm the answer in the StackOverflow works for me! Thank you and the author of the answer!
Although I found there might need to be a bit modification from the original answer, mainly I think we need to add username@ before login server name (sorry I'm not able to comment there since my StackOverflow account is new).
A recap of the procedure:
- Submit an interactive job (e.g.
sallocfor slurm), get the computing node assigned. - On VSCode, add remote SSH using
ssh -J [email protected] username@nodeXXX. The-Joption will set the "ProxyJump" in the~/.ssh/configfile, and it will look like:
Host MyCluster
HostName nodeXXX
ProxyJump [email protected]
User username
- The setup is ready, you can open this SSH target in VSCode. You might need to enter your password twice for first logging into the login node, and then the computing node. Now you should be able to work on the remote computing node!!
A reminder: the key is to set the ~/.ssh/config correctly, be aware of the jump node's name. And remember to change the nodeXXX name every time to the computing node assigned.
p.s. It originally didn't work on one of my clusters somehow. But after I use the SSH key file, and specify the IdentityFile in the ~/.ssh/config, the problem was solved.
So, I suggest using the SSH key and set the ~/.ssh/config as:
Host MyCluster
HostName nodeXXX
ProxyJump [email protected]
User username
IdentityFile ~\.ssh\my_key
This saves you from entering your password twice every time anyway.
I tried the solution. I am able to start VS code on the computing node but it returns a shell in the computing node on in the Slurm job. Is there a way to have VS Code shell / language servers step into the job?
I tried the solution. I am able to start VS code on the computing node but it returns a shell in the computing node on in the Slurm job. Is there a way to have VS Code shell / language servers step into the job?
Sorry, I'm not sure what you mean. Did you try to open the Explorer in VSCode and work on some code scripts? I think the language server will step in automatically when you work on a certain code script.
Aye. The solution works in the sense I can connect to the compute nodes but I am not inside the Slurm job so I don't have access to the resources it allocated for it. I can start coding and the language server steps in but I am now consuming resources on that node that would not be tracked by the cluster job manager through Slurm.
The solution isn't super practical for me as the nodes get allocated with arbitrary names
Ideally this proxy jumping would be automated