awx icon indicating copy to clipboard operation
awx copied to clipboard

Job failure as minikube logs rotate

Open bbourchanin opened this issue 10 months ago • 2 comments

Please confirm the following

  • [X] I agree to follow this project's code of conduct.
  • [X] I have checked the current issues for duplicates.
  • [X] I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • [X] I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

On an AWX/minikube deployment I must use a playbook implementation workaround to avoid job failure due to minikube job pod's log rotating each 200Mo or so.

1/ the --log_file_max_size specification at minikube start does not work (minikube 1.32)

Observed with long loops (800 items) over small included_task playbook with incremental set_facts within (I use included_tasks to overcome RAM OOM pod failure appening within a single playbook).

The workaround is to place a no_log at my set_facts tasks to avoid logs bloat.

Hope having been clear.

Best regards,

AWX version

24.2.0

Select the relevant components

  • [ ] UI
  • [ ] UI (tech preview)
  • [ ] API
  • [ ] Docs
  • [ ] Collection
  • [ ] CLI
  • [X] Other

Installation method

minikube

Modifications

no

Ansible version

No response

Operating system

No response

Web browser

No response

Steps to reproduce

Long loops (800 items) over small included_task playbook and an incremental set_fact (with combine) in this included playbook :

main playbook : - name: Boucle SNMP sur les équipements no_log: true ansible.builtin.include_tasks: file: gather_snmp_facts.yml loop: "{{ network_eqpts | dict2items }}"

included playbook (gather_snmp_facts.yml) : - name: Enrichissement du dictionnaire d'éqpts (os+version) # Les boucles set_fact génèrent une grande quantité de logs. # Il est préférable des les désactiver pour les gros playbooks # au risque d'un plantage du job AWX (BBO 12/04/2024) no_log: true vars: ios: "{{ stdout_cisco.stdout | regex_search('Cisco IOS(?!.+IOSXE)') }}" iosxe: "{{ stdout_cisco.stdout | regex_search('IOSXE') }}" nxos: "{{ stdout_cisco.stdout | regex_search('NX-OS') }}" # Pour les nexans, on identife le constructeur par # le descriptif 54VDC (switchs intégrés dans les goulottes courant fort) nexans: "{{ stdout_cisco.stdout | regex_search('54VDC') }}" checkpoint: "{{ stdout_checkpoint.stdout | default('') | regex_search('cpx86_64') }}" cisco_version: "{{ stdout_cisco.stdout | regex_search('(?i)(?<=version )([^ ,]+)') }}" checkpoint_version: "{{ stdout_checkpoint.stdout | default('') | regex_search('(?i)([^ ]+)(?=cpx86_64)') }}" ansible.builtin.set_fact: network_eqpts: "{{ network_eqpts | combine( { item.key : item.value | combine ({'os': {'Cisco IOS': 'IOS', 'IOSXE': 'IOS-XE', 'NX-OS': 'NX-OS', '54VDC': 'Nexans', 'cpx86_64': 'Checkpoint'}[ios+iosxe+nxos+nexans+checkpoint] | default(''), 'version': cisco_version+checkpoint_version}) } ) }}"

(the pod logs shows the set_facts logs repeats the complete fact for each iteration even if it does not show at stdout)

Expected results

Jobs goes to the end of the playbook

Actual results

Jobs is interrupted (as failure) as pod logs rotate (and stdout is truncated).

Additional information

No response

bbourchanin avatar Apr 12 '24 07:04 bbourchanin

do you have receptor reconnect feature enabled?

see this original PR that also explains how to enable it https://github.com/ansible/receptor/pull/683

fosterseth avatar Apr 12 '24 17:04 fosterseth

Hi,

Not tried I must say as I supposed the behaviour should have been ok with the default configured RECEPTOR_KUBE_SUPPORT_RECONNECT on "auto" as it seems suggested by this thread.

I will give it a try.

bbourchanin avatar Apr 22 '24 11:04 bbourchanin

I can not reproduce the pb right now. I close it for now and will see ...

Best regards,

bbourchanin avatar May 03 '24 09:05 bbourchanin