gridmap icon indicating copy to clipboard operation
gridmap copied to clipboard

Encountered ConnectionRefusedError instead of handle_resubmit

Open nick-youngblut opened this issue 5 years ago • 0 comments

If not enough resources are provided for the job (eg., using 'h_rt=00:00:01' in the map_reduce.py), I don't get automatic resubmissions via the handle_resubmit function, but rather:

2020-04-23 12:53:33,683 - gridmap.job - INFO - Encountered ConnectionRefusedError, so killing all jobs.
Traceback (most recent call last):
  File "./examples/map_reduce.py", line 131, in <module>
    main(args)
  File "./examples/map_reduce.py", line 121, in main
    queue=args.queue)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/site-packages/gridmap-0.14.0-py3.6.egg/gridmap/job.py", line 985, in grid_map
    require_cluster=require_cluster)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/site-packages/gridmap-0.14.0-py3.6.egg/gridmap/job.py", line 870, in process_jobs
    monitor.check(sid, jobs)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/site-packages/gridmap-0.14.0-py3.6.egg/gridmap/job.py", line 427, in check
    self.check_if_alive()
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/site-packages/gridmap-0.14.0-py3.6.egg/gridmap/job.py", line 493, in check_if_alive
    send_error_mail(job)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/site-packages/gridmap-0.14.0-py3.6.egg/gridmap/job.py", line 649, in send_error_mail
    _send_mail(subject, body_text, attachments)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/site-packages/gridmap-0.14.0-py3.6.egg/gridmap/job.py", line 539, in _send_mail
    s = smtplib.SMTP(SMTP_SERVER)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/smtplib.py", line 251, in __init__
    (code, msg) = self.connect(host, port)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/smtplib.py", line 336, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/smtplib.py", line 307, in _get_socket
    self.source_address)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/socket.py", line 724, in create_connection
    raise err
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/gridmap/lib/python3.6/socket.py", line 713, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

Also, it would help if the user could escalate job resources for each re-submission, as snakemake does.

nick-youngblut avatar Apr 23 '20 11:04 nick-youngblut