HpBandSter icon indicating copy to clipboard operation
HpBandSter copied to clipboard

Running multiple Jobs on HPC using Slurm

Open hrakhshani opened this issue 6 years ago • 4 comments
trafficstars

Hello,

I already tried to run several jobs on a cluster. The jobs are running on the server but the output files are always empty. I would be grateful if you could help me,

Thank you in advance.

TSC.txt

hrakhshani avatar May 19 '19 21:05 hrakhshani

One possible reason might be that python uses output buffering. To disable it, you can add the -u flag when calling your python script. python -u my_hp_script.py

matze298 avatar Jun 06 '19 10:06 matze298

I tried it and it didn't work

hrakhshani avatar Jun 12 '19 11:06 hrakhshani

Do I see that correctly that your code is based on the first example? If so, the problem is that you have the workers and the master trying to communicate using 127.0.01, i.e. the loop back interface. This is fine for running things locally, but doesn't work on the cluster when every worker might be on a different machine. Please have a look at the fourth example that shows how that could be done. Let me know if you need any further help.

BTW: If you need more output, you could increase the logger level to debug which would have shown that the master doesn't find any workers.

sfalkner avatar Jun 14 '19 17:06 sfalkner

BTW: If you need more output, you could increase the logger level to debug which would have shown that the master doesn't find any workers.

Perfect! Thank you so much.

hrakhshani avatar Jun 14 '19 18:06 hrakhshani