jupyter-scheduler
jupyter-scheduler copied to clipboard
Automatically delete all old Notebook jobs
Problem
I have dozens of jobs running during a day. Each Notebook job an their results are persisted in DB and on disk. Unfortunately, if I want to clean old jobs, I need to click delete button for each one twice. Currently I have around 60 jobs a day, what makes almost impossible to clean not needed results.
Proposed Solution
It would make sense to have a possibility to clean old and outdated jobs on a scheduler. For instance, to have an option for a job definition or globally in scheduler to remove old jobs executed N days ago.
An option to delete all jobs from job definition would also make sense to make it easy to delete all old jobs at once.
Additional context
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
I would like to revive this thread, this is something that would improve the all experience a lot.
Is there any way of doing the same thing from outside the UI? I have some minute jobs that generate 100Ks reports very quickly.
Until this feature is implemented, I have created a very short notebook that runs once a day and removed jobs older than N days from the included sqlite database. FYI if you are interested @quentindurpoix.
I do something similar for failed notebooks jobs and a service email account I have for system notices.
Here is the code:
import sqlite3
import datetime
conn = sqlite3.connect('/home/ritz/.local/share/jupyter/scheduler.sqlite')
cur = conn.cursor()
# Simply change the number of days to whatever you want.
timestamp_threshold = int((datetime.datetime.now() - datetime.timedelta(days=60)).timestamp() * 1000)
cur.execute(f"DELETE FROM jobs WHERE start_time < {timestamp_threshold}")
conn.commit()
conn.close()
It is working perfectly and that's what I was looking for! Thank you so much @robertritz
Let's hope that a real feature will be implemented in the future!