No Progress
Hello, I have installed the toolkit but when I click start it says starting but nothing happens. Left it for over 1 hour with no progress and nothing showing up in the progress window.
Same here. Additionally queue management doesn't seem to work. Whenever I try to stop a job it just enters a loop, printing '[WORKER] Stopping job [some ID] on GPU(s) 0'.
I have the same issue. 4090, 64gb ram. Nothing shows up in progress, but in the past it worked fine. I recently updated and tried a fresh install as well.
I have deleted then reinstalled using their easy installer and it worked even though the first time it did not so give it a try
I am also facing the same issue. It only says Starting Job and shows no progress.
Same here on RTX 6000 Pro. Any solutions?
For me the python run.py .../.job_config.json get oom silently and stop the process, it's probably a bug with newer versions, look at this comment https://github.com/ostris/ai-toolkit/issues/457#issuecomment-3393679558
use venv environment, not conda environment, then the progress works
i press it further -
Starting a job only queues it, but doesn't automatically start the queue. You need to manually start the queue through the UI.
To Prevent This in the Future:
There should be a "Start Queue" button in the UI at: http://192.168.1.101:8675/jobs
i push hotfix here - https://github.com/johndpope/ai-toolkit/commit/fd55becb57bb7d6d69e69e07fd95df802e4394e2
SAME ISSUE
Edit: The issue is the requirements are not being installed properly to the venv's (or python-embeded) Lib\site-packages folder. Even activating the venv before doing installs doesn't seem to guarantee that things will get installed inside the venv. Use whatever LLM you want to help you --target pip install's the python dependencies to the correct folder.
To troubleshoot the exact issue you can use the following url to view the logs
http://localhost:8675/api/jobs/
in my case the cuda and torch libraries were not installed correctly
Admit it guys. Like me, most of y'all forgot to do the pip-install on requirements.txt and tried to run the training. I feel so stupid sometimes 🤣
Admit it guys. Like me, most of y'all forgot to do the pip-install on requirements.txt and tried to run the training. I feel so stupid sometimes 🤣
That's it. I installed AI Toolkit yesterday and forgot this simple step. Thanks