batchspawner
batchspawner copied to clipboard
Slurm batch script gets cached in DB. Changes in config file are ignored
I am running batchspawner-0.9.0.dev0 with a recent developer version of jupyterhub (1.0.0b1) and Optionsspawner 0.1.0.
After running once with a certain user, changes to the Slurm batch script inside of the jupyterhub-config.py file are not taking effect after I have restarted the server. I can see in the logs that still the old Slurm script is used for that user. So, the information seems to persist somewhere. After deleting the sqlite DB and starting jupyterhub, the new batch script from the config file is correctly used.
I investigated in the DB, and I found the information in the spawners table:
sqlite> select * from spawners;
1|1||||||
2|2||||||
3|3|1|{
"child_conf": {"batch_script": "#!/bin/bash\n#SBATCH --constraint=mc\n#SBATCH --partition={{partition}}\n#SBATCH --ntasks={{ntasks}}\n#SBATCH --time={{runtime}}\n#SBATCH --output={{homedir}}/jupyterhub_batchspawner_%j.log\n#SBATCH --job-name=spawner-jupyterhub\n#SBATCH --workdir={{homedir}}\n#SBATCH --mem-per-cpu=8000\n#SBATCH
...
Why this kind of information should be cached and reused is currently beyond me. Are others seeing this effect as well? It is a very undesirable effect, since it forces me currently to always delete the database when I am adapting the Slurm script.
Thanks for any information, Derek
I think you are from what I can see, but just to be sure are you using the latest wrapspawner, too?
If so I think it's probably related to latest changes in how the wrapspawner and child spawners relate. I can't check now but will look when I can. This isn't expected behavior (or what I have seen), obviously.
It was the latest version when I installed in May. Wrapspawner (commit 5f2b707 from Feb 9 2018), I can see that there have been 2 more commits fixing issues 27 and 28 in wrapspawner, 28 being a documentation change. I do not think that 27 really touches my problem. but I could give it a try.
I wanted to use fairly recent versions for the fixing of the singleserver cleanups (https://github.com/jupyterhub/jupyterhub/pull/2519).
Do all profile options include the batch script? I think you would see this error if some profiles had batch_script
and some did not.
I seem to remember something about spawner objects not being re-created, so if one profile includes an option, the default option won't be re-set if the user re-starts another server.
JH seems to load the old state from database always, even when restarting all of JH with a non-running spawner. Thus, the fundamental issue is that wrapspawner doesn't distinguish between creating a child spawner when spawning and when loading from state - both ways, it saves applies the child config. And child config state is saved in the database.
Does this match what you see?
The spawner-relevant configurations are set up like this
spawner_config = {
'batch_script': batch_script,
}
c.JupyterHub.spawner_class = 'optionsspawner.OptionsFormSpawner'
c.OptionsFormSpawner.child_class = 'batchspawner.SlurmSpawner'
c.OptionsFormSpawner.child_config = spawner_config
c.OptionsFormSpawner.form_fields = form_fields
The options-widgets for optionspawner just ask for queue-name, time to run, etc, but the batch script is always the same template where the values then are filled in based on the user input to the options.
Yesterday I set up a new test environment and I am using the master/HEAD versions from the repos for these components
git clone https://github.com/jupyterhub/jupyterhub.git
git clone https://github.com/jupyterhub/batchspawner.git
git clone https://github.com/jupyterhub/wrapspawner.git
git clone https://github.com/ResearchComputing/jupyterhub-options-spawner.git
I still see the same behavior with these versions. I think your last sentence seems to capture the problem. For some reason the script gets stored as "state" information which is reloaded from DB at the start of the server. And the user then always gets the script from his state cache in the DB.
I have not gone into the code to understand this, but do you think this generally a problem of wrapspawner and therefore all of its derived spawners?
Just wanted to update this long standing-issue with the workaround that I am using.
The batch script gets stored within the state
field of a user's entry in the spawners
table. I confirmed that deleting that field's value does not cause any damage, even for a running session. Deleting the state fields of users during a service shutdown is a workaround that allows keeping the DB.
UPDATE spawners SET state = null;