batchspawner
batchspawner copied to clipboard
Add 'cancel spawn' functionality
Proposed change
I would like to give our HPC cluster users the ability to cancel the spawning process. If they have selected the wrong resources, they may find themselves in the spawning state without the possibility to cancel it and to spawn another job with differently chosen resources.
Alternative options
The only means to cancel a pending job is ssh'ing into the cluster and do an scancel <jobid>
(SLURM)
Who would use this feature?
All our HPC users.
(Optional): Suggest a solution
A button with "cancel spawn" functionality. It simply does a scancel <jobid>
(SLURM) or comparable commands for Torque etc.
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
This is a nice idea, and now that you mention it, it would be useful.
The problem is that the spawn process and the page presented to the user is controlled by the hub, so somehow JupyterHub would have to be adjusted to have these options, and then batchspawner could use it.
Other relevant issues I can find:
- https://github.com/jupyterhub/jupyterhub/issues/2975 - one can't stop a pending server. This would presumably need to be solved first.
So, I propose we transfer this to the JupyterHub repository. Any other comments about this? (Perhaps we can discuss at our monthly meeting)
So, I propose we transfer this to the JupyterHub repository. Any other comments about this? (Perhaps we can discuss at our monthly meeting)
If you could initiate that: Could you please take over? Thanks!
Did this feature request get created on the JupyterHub repo? I can't seem to find it.
So what are the alternative at the moment?
- letting the jobs run forever (sigh) potentially filling the cluster if you have more users than nodes
- putting a wallclock limit and let slurm kill the job, with a poor experience for the users who get killed while doing their jobs
- enabling culling? Any downsides about the latter?
This is the relevant JupyterHub issue, deleting is effectively the same as cancelling: https://github.com/jupyterhub/jupyterhub/issues/2975