ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

Training stucked, keep starting, no task found

Open sharkDDD opened this issue 3 weeks ago • 10 comments

My training is stuck here. It keeps showing "running," but no task processes can be found when checking the GPU, there is no GPU utilization, and there are no task error messages. Can someone help me? Thanks so much.

Image

sharkDDD avatar Dec 04 '25 14:12 sharkDDD

me too!

y93442100-beep avatar Dec 05 '25 08:12 y93442100-beep

The same thing happened to me. In a desperate move, I installed everything manually (on Windows) and followed the steps, and now I was able to do my first training. I think the error happens because some of the pip install commands were executed outside the virtual environment — meaning that "python -m venv venv" and ".\venv\Scripts\activate" are very important for it to work

git clone https://github.com/ostris/ai-toolkit.git
cd ai-toolkit
python -m venv venv
.\venv\Scripts\activate
pip install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

You should try in a new directory and install everything manually step by step…

pcorezzola avatar Dec 05 '25 11:12 pcorezzola

Had the same issue and i just reinstaled the packages with (make sure to run as admin):

python -m venv venv
.\venv\Scripts\activate
pip install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt

and then in a new terminal (not as admin):

cd ui
npm run build_and_start

and now it works perfectly fine

nilsesen avatar Dec 05 '25 13:12 nilsesen

I ran into this issue after i moved the ai-toolkit folder to a different location.

devichris avatar Dec 06 '25 21:12 devichris

Same problem here. Nothing happens. Just... Nothing :/ Will try installing manually to see if that works better. At least I have a better overview of what's installed and where then...

Yepp... Installing manually worked. Started training right away. DON'T USE THE AUTOMATIC INSTALL!

Kallamamran avatar Dec 07 '25 23:12 Kallamamran

Yes, manual works, thank you!

grandmaneedsmorecake avatar Dec 14 '25 16:12 grandmaneedsmorecake

I have the same problems but manual installing does not work. So I try something else and it works this time. Delete the venv folder. Delete old Python (mine was 3.13) and then reinstalling with Python 3.12 Then run this code

python -m venv venv .\venv\Scripts\activate pip install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126 pip install -r requirements.txt

thangquay347 avatar Dec 16 '25 11:12 thangquay347

Manually cloning the models from hugging face, and hardcode the path to the model/adapter worked for me to bypass this problem. Seems like the default installer is having issue fetching the models from hugging face.

Image

ericluo05 avatar Dec 22 '25 06:12 ericluo05

I am also stuck there. after hitting play to start the queue nothing happens

mertakbal avatar Dec 22 '25 16:12 mertakbal

Mine start and an empty python terminal appears and disappears. Still on infinite "Starting job...". Tried all the tips suggested on this issue :C

Ozamatheus avatar Dec 24 '25 01:12 Ozamatheus