semaphore icon indicating copy to clipboard operation
semaphore copied to clipboard

Template which should run all 2 minutes is stuck in waiting since updating to 2.9.75

Open livdebus opened this issue 1 year ago • 10 comments

Since updating from 2.9.37 to 2.9.75 we have a template which is set to run all two minutes (via template cron settings */2 * * * *) getting stuck in status waiting. Only fix is to stop last waiting job and reboot. Happens every day and does not recover itself.

Other templates do run fine, even when the affected template is stuck. So it does only affect this template.

Anyone able to help?

image image

livdebus avatar May 07 '24 11:05 livdebus

Same here. This does happen in a synchronize pull task. @livdebus is this also the case for you?

ivishared avatar May 14 '24 12:05 ivishared

This is the debud log output of the stucking task: TASK [Pull web files to semaphore] ********************************************* task path: /home/semaphore/repository_6_36/FileTransmit.yml:35 [DEPRECATION WARNING]: The connection's stdin object is deprecated. Call display.prompt_until(msg) instead. This feature will be removed in version 2.19. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. redirecting (type: modules) ansible.builtin.synchronize to ansible.posix.synchronize redirecting (type: action) ansible.builtin.synchronize to ansible.posix.synchronize redirecting (type: action) ansible.builtin.synchronize to ansible.posix.synchronize ESTABLISH LOCAL CONNECTION FOR USER: semaphore EXEC /bin/sh -c '( umask 77 && mkdir -p " echo /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm "&& mkdir " echo /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235 " && echo ansible-tmp-1715689337.1515107-8357-129744323250235=" echo /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235 " ) && sleep 0' Using module file /home/semaphore/.ansible/collections/ansible_collections/ansible/posix/plugins/modules/synchronize.py PUT /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/tmpwx00atba TO /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235/AnsiballZ_synchronize.py EXEC /bin/sh -c 'chmod u+x /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235/ /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235/AnsiballZ_synchronize.py && sleep 0' EXEC /bin/sh -c '/usr/bin/python3.11 /home/semaphore/.ansible/tmp/ansible-local-7673hkwayfsm/ansible-tmp-1715689337.1515107-8357-129744323250235/AnsiballZ_synchronize.py && sleep 0' Running playbook failed: signal: killed

The definition of the task is: - name: Pull web files to semaphore synchronize: mode: pull src: "{{ webpath }}/*" dest: "/tmp/{{ inventory_hostname }}"

ivishared avatar May 14 '24 12:05 ivishared

Same here. This does happen in a synchronize pull task. @livdebus is this also the case for you?

nope, it is a task which runs a powershell script on a windows host

livdebus avatar May 14 '24 12:05 livdebus

but seems that duplicating the affected task did solve the issue for me, stable now since over 4 days

livdebus avatar May 14 '24 12:05 livdebus

Hi @tboerger can you reproduce this issue? I can't.

fiftin avatar May 23 '24 18:05 fiftin

@livdebus is the task works fine?

fiftin avatar May 23 '24 19:05 fiftin

Me neither

tboerger avatar May 23 '24 20:05 tboerger

@livdebus is the task works fine?

yes still works fine since duplicating the affected task

livdebus avatar May 24 '24 08:05 livdebus

Happens again: image

image

This time duplicating the template did not help. But I have some more details. Seems that a previous task is hanging and therefore all further task of the same template are stuck in queued status.

But task is hanging in a state where actual playbook has not started (confirmed since playbook would write to a custom logfile which did not happen), so it is still in preparation stage (github updating): image

Template details: image

Is there anywhere a logfile of semaphore which would show more details?

Only workaround so far is to reboot the semaphore host then the runbooks works again for some hours....

livdebus avatar May 29 '24 11:05 livdebus

Or is there any way to define a timeout for a template? So it is forced to stop a single run after some time.

livdebus avatar May 29 '24 11:05 livdebus

+1 for having a possibility to define a timeout for a template. I've found myself in a similar situation having multiple tasks waiting for an other task to complete. The problem was a simple use of the apt module that was somehow blocked. Because of the fact that you can't define a global timeout for a playbook in Ansible having Semaphore to stop or at least warn about hanging templates would be great.

cm-schl avatar Oct 29 '24 10:10 cm-schl