DINO
DINO copied to clipboard
How to check the progress of distributed run "bash scripts/DINO_train_submitit.sh /path/to/my/COCODIR"
I am using pytorch 1.11 on Ubuntu 20.04. The system configuration works fine with the command "bash scripts/DINO_train.sh /path/to/my/COCODIR". I have submitted a distributed run of "bash scripts/DINO_train_submitit.sh /path/to/my/COCODIR". The terminal (command line window) shows "Submitted job_id: 11007" and returns to system prompt. Nothing shows up in the terminal after that. Does that mean the distributed run is continous running or something went wrong? I checked the "experiments" folder and nothing is generated there either. As a result, I am asking for help to find a way to know if my training job is terminated or is its still progressing. If the training is progress, how much it has progressed, e.g. number of epochs completed, etc...