ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

stopping job command freezes job

Open lemuriandezapada opened this issue 6 months ago • 13 comments

This is for bugs only

Did you already ask in the discord?

No

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes

Describe the bug

Sometimes pressing the stop job button (at least in the docker version of the system) puts the job into a permanent "stopping job" state that doesn't actually stop the job. You have to force kill the python process and then update the database manually if you want to rerun it. I have no idea what the cause is

Image

lemuriandezapada avatar Jun 18 '25 17:06 lemuriandezapada

same problem

Songssx avatar Jun 30 '25 04:06 Songssx

Which database file specifically needs to be modified

fishinboat avatar Jul 29 '25 02:07 fishinboat

just ran into this issue. was able to resolve by downloading sqllite: https://sqlitebrowser.org/dl/

and opening aitk_db.db from the root folder

then opening the jobs folder, changing the text for "status" to "stopped" and then writing the file. This allowed me to start the job back up again.

indianaorz avatar Aug 07 '25 04:08 indianaorz

thanks @indianaorz - was very very helpful! for anybody not so firm in messing around in DBs as me here a very short walkthrough:

  1. close AI Toolkit and navigate to your AI Toolkit folder and find the "aitk_db.db" Database
  2. install the "DB Browser" as @indianaorz mentioned here: https://sqlitebrowser.org/dl/
  3. open the "aitk_db.db" with DB Browser and go to Tab "Search Database"
  4. here you see the jobs you have created
  5. change in the "status" collumn the word "running" to "stopped" and save it
  6. save the file, restart AI Toolkit and continue your training

LiquefyR avatar Aug 17 '25 09:08 LiquefyR

Same problem, happens a lot, also after crash

StefanRademakers avatar Aug 20 '25 11:08 StefanRademakers

@StefanRademakers do you get it working again with the steps above?

LiquefyR avatar Aug 20 '25 11:08 LiquefyR

Just for reference, in SQL I used

sqlite> UPDATE Job
   ...> SET status = 'completed',
   ...>     stop = 0
   ...> WHERE name = 'my-job-name';

https://discordapp.com/channels/1144033039786188891/1404578747407139027/1404587816624848977

alankent avatar Aug 27 '25 22:08 alankent

Had a similar issue, I can confirm that this works to get a job back to stand-by, and ready to be started again

daracazamea avatar Sep 08 '25 11:09 daracazamea

I did this but now the job status says stopped but job info says stopping job... and I'm unable to start the training again.

Fijitrix avatar Sep 11 '25 01:09 Fijitrix

Thanks, worked for me setting my job to stopped with this command:

>  python - <<'PY'
> import sqlite3
> con = sqlite3.connect("aitk_db.db")
> cur = con.cursor()
> 
> job_id = "91534fcb-a468-4478-ab41-c8bcaeae624d"
> cur.execute("""
> UPDATE Job
> SET status='stopped',
>     stop=1,
>     updated_at=strftime('%s','now')
> WHERE id=?;
> """, (job_id,))
> con.commit()
> print("Job", job_id, "updated to 'stopped'.")
> con.close()
> PY

dxjmz avatar Sep 30 '25 20:09 dxjmz

another workaround, if you don't want to edit database files.

  1. rename the job folder, ie ai-toolkit/output/job_name_temp
  2. create an empty folder with the original name ie ai-toolkit/output/job_name - I don't know if this is strictly necessary, just in case
  3. delete the job in the UI, happens immediately, it will clean up the job folder, but that's now just an empty folder
  4. rename the job folder back from job_name_temp to job_name
  5. create a new job, and copy the contents of the job_name/config.yaml into the advanced json view to get all the settings back
  6. start it, it will resume from the last saved model - which is a similar drawback to the other workaround as far as I can tell

deefster avatar Nov 06 '25 22:11 deefster

thanks @indianaorz - was very very helpful! for anybody not so firm in messing around in DBs as me here a very short walkthrough:

  1. close AI Toolkit and navigate to your AI Toolkit folder and find the "aitk_db.db" Database
  2. install the "DB Browser" as @indianaorz mentioned here: https://sqlitebrowser.org/dl/
  3. open the "aitk_db.db" with DB Browser and go to Tab "Search Database"
  4. here you see the jobs you have created
  5. change in the "status" collumn the word "running" to "stopped" and save it
  6. save the file, restart AI Toolkit and continue your training

Thank you man!!!!

Genzo9319 avatar Dec 02 '25 19:12 Genzo9319