netadapt icon indicating copy to clipboard operation
netadapt copied to clipboard

No progress of algorithm

Open cucuuumber opened this issue 5 years ago • 1 comments

Hi!

I've tried running the code you've provided on MobileNet, the pretrained model, which was given by you as an example. I believe that calculating LUTs went correctly. Then I've tried applying the algorithm with 0,5 latency, by running:

sh scripts/netadapt_mobilenet-0.5latency.sh

but after some time the algorithm stopped proceeding on:

Launch a worker for block 13 ['/usr/local/bin/python', 'worker.py', 'models/mobilenet/prune-by-latency/worker', 'models/mobilenet/prune-by-latency/master/iter_0_best_model.pth.tar', '13', 'LATENCY', '0.033316052734851845', '1', '500', '0', 'latency_lut/lut_mobilenet.pkl', 'data/', '3', '224', '224', 'mobilenet', '0.001'] Update job list: [{'iteration': 1, 'block': 1, 'gpu': 1}, {'iteration': 1, 'block': 2, 'gpu': 2}, {'iteration': 1, 'block': 3, 'gpu': 3}, {'iteration': 1, 'block': 4, 'gpu': 4}, {'iteration': 1, 'block': 5, 'gpu': 5}, {'iteration': 1, 'block': 6, 'gpu': 6}, {'iteration': 1, 'block': 13, 'gpu': 0}] Update available gpu: []

Despite of 3 days of computing it didn't progress more. Could you advise me if something is wrong there?

Regards, Piotrek

cucuuumber avatar May 18 '20 06:05 cucuuumber

It looks like at least one of these workers crashes so the master did not hear back from these workers. You can check the log of these workers to figure out why the workers crashed.

denru01 avatar May 18 '20 16:05 denru01