htmap icon indicating copy to clipboard operation
htmap copied to clipboard

stdout can't be viewed as job progresses

Open stsievert opened this issue 4 years ago • 3 comments

What's your issue? I have launched several (supposedly) short jobs. On EC2 with a modern NVIDIA GPU, they take around 40 minutes. I have launched these jobs on HTCondor, and specified a GPU that's less modern. The jobs apparently take at least 120 minutes on this lower capability GPU.

I'd like some idea of the job progress, and am printing some items to stdout to view the progress. So, let's view the output of one of the running jobs:

(base) [stsievert@submit2 exp-cifar10]$ htmap stdout adadamp 0
# hangs...

This hangs indefinitely. This means I can't monitor the progress of any one component; I have to for that component to complete.

What would resolve your issue? If the jobs stdout could be viewed even if the job wasn't completed.

(base) [stsievert@submit2 exp-cifar10]$ htmap stdout foo 0  # get output so far
iteration 0 out of 100, loss: 2.2
iteration 1 out of 100, loss: 1.2
iteration 2 out of 100, loss: 0.9
(base) [stsievert@submit2 exp-cifar10]$ # wait a while
(base) [stsievert@submit2 exp-cifar10]$ htmap stdout foo 0
iteration 1 out of 100, loss: 2.2
iteration 2 out of 100, loss: 1.2
iteration 3 out of 100, loss: 0.9
# ...
iteration 20 out of 100, loss: 0.02
iteration 21 out of 100, loss: 0.017
(base) [stsievert@submit2 exp-cifar10]$ # wait even longer
(base) [stsievert@submit2 exp-cifar10]$ htmap stdout foo 0
iteration 1 out of 100, loss: 2.2
iteration 2 out of 100, loss: 1.2
iteration 3 out of 100, loss: 0.9
# ...
iteration 98 out of 100, loss: 0.0012
iteration 99 out of 100, loss: 0.001

stsievert avatar Apr 08 '20 21:04 stsievert