stopping job command freezes job
This is for bugs only
Did you already ask in the discord?
No
You verified that this is a bug and not a feature request or question by asking in the discord?
Yes
Describe the bug
Sometimes pressing the stop job button (at least in the docker version of the system) puts the job into a permanent "stopping job" state that doesn't actually stop the job. You have to force kill the python process and then update the database manually if you want to rerun it. I have no idea what the cause is
same problem
Which database file specifically needs to be modified
just ran into this issue. was able to resolve by downloading sqllite: https://sqlitebrowser.org/dl/
and opening aitk_db.db from the root folder
then opening the jobs folder, changing the text for "status" to "stopped" and then writing the file. This allowed me to start the job back up again.
thanks @indianaorz - was very very helpful! for anybody not so firm in messing around in DBs as me here a very short walkthrough:
- close AI Toolkit and navigate to your AI Toolkit folder and find the "aitk_db.db" Database
- install the "DB Browser" as @indianaorz mentioned here: https://sqlitebrowser.org/dl/
- open the "aitk_db.db" with DB Browser and go to Tab "Search Database"
- here you see the jobs you have created
- change in the "status" collumn the word "running" to "stopped" and save it
- save the file, restart AI Toolkit and continue your training
Same problem, happens a lot, also after crash
@StefanRademakers do you get it working again with the steps above?
Just for reference, in SQL I used
sqlite> UPDATE Job
...> SET status = 'completed',
...> stop = 0
...> WHERE name = 'my-job-name';
https://discordapp.com/channels/1144033039786188891/1404578747407139027/1404587816624848977
Had a similar issue, I can confirm that this works to get a job back to stand-by, and ready to be started again
I did this but now the job status says stopped but job info says stopping job... and I'm unable to start the training again.
Thanks, worked for me setting my job to stopped with this command:
> python - <<'PY'
> import sqlite3
> con = sqlite3.connect("aitk_db.db")
> cur = con.cursor()
>
> job_id = "91534fcb-a468-4478-ab41-c8bcaeae624d"
> cur.execute("""
> UPDATE Job
> SET status='stopped',
> stop=1,
> updated_at=strftime('%s','now')
> WHERE id=?;
> """, (job_id,))
> con.commit()
> print("Job", job_id, "updated to 'stopped'.")
> con.close()
> PY
another workaround, if you don't want to edit database files.
- rename the job folder, ie ai-toolkit/output/job_name_temp
- create an empty folder with the original name ie ai-toolkit/output/job_name - I don't know if this is strictly necessary, just in case
- delete the job in the UI, happens immediately, it will clean up the job folder, but that's now just an empty folder
- rename the job folder back from job_name_temp to job_name
- create a new job, and copy the contents of the job_name/config.yaml into the advanced json view to get all the settings back
- start it, it will resume from the last saved model - which is a similar drawback to the other workaround as far as I can tell
thanks @indianaorz - was very very helpful! for anybody not so firm in messing around in DBs as me here a very short walkthrough:
- close AI Toolkit and navigate to your AI Toolkit folder and find the "aitk_db.db" Database
- install the "DB Browser" as @indianaorz mentioned here: https://sqlitebrowser.org/dl/
- open the "aitk_db.db" with DB Browser and go to Tab "Search Database"
- here you see the jobs you have created
- change in the "status" collumn the word "running" to "stopped" and save it
- save the file, restart AI Toolkit and continue your training
Thank you man!!!!