awx
awx copied to clipboard
K8S json parse error
Please confirm the following
- [X] I agree to follow this project's code of conduct.
- [X] I have checked the current issues for duplicates.
- [X] I understand that AWX is open source software provided for free and that I might not receive a timely response.
- [X] I am NOT reporting a (potential) security vulnerability. (These should be emailed to
[email protected]
instead.)
Bug Summary
Error started to happen around 3 days ago, i tested it on:
AWX 23.7.0 & 23.8.0 & 23.8.1
ee_extra_env: |
- name: RECEPTOR_KUBE_SUPPORT_RECONNECT
value: disabled
Also doesn't help. Based on other similar bug reports i also tested watchers limit and max file limit. Problem happen only on k8s hosts, after adding receptor_kube_support_reconnect on disabled - job finish sucessfully, but error on web still persist. Based on some comment i also tested out awx-ee in most available versions, unfortunately without success. Automation_job pod logs are large, that might be a problem so next step i tried out was to increase container-log-max-size but it also didn't helped. Verbosity level on job also dosen't change anything.
AWX version
AWX 23.7.0 & 23.8.0 & 23.8.1 (tried on all 3)
Select the relevant components
- [X] UI
- [ ] UI (tech preview)
- [ ] API
- [ ] Docs
- [ ] Collection
- [ ] CLI
- [ ] Other
Installation method
kubernetes
Modifications
no
Ansible version
on host ansible 2.9.6
Operating system
Debian
Web browser
Firefox, Chrome, Safari, Edge
Steps to reproduce
Run playbook on some large k8s cluster(i tried out 3 different k8s clusters)
Expected results
Job finished sucessfully also in web ui.
Actual results
Job finish but error on web persist - like on screen above.
Additional information
No response
Experiencing same issue, but with inventory sync -
kubectl get nodes
NAME STATUS ROLES AGE VERSION
Node1 Ready control-plane 543d v1.25.0
Node2 Ready
kubelet --version [root@astdc-k8sawx01p ~]# kubelet --version Kubernetes v1.25.0
AWX UI Error's out
for these jobs that are ending in Error -- are your automation job pods completing successfully? you can disable pod cleanup by adding
extra_settings:
- setting: RECEPTOR_RELEASE_WORK
value: "False"
to your AWX spec file (note only do this for debugging purposes!)
Are the pods in a Completed status? if you tail the logs of the job pod, do you see the zipfile contents?
for these jobs that are ending in Error -- are your automation job pods completing successfully? you can disable pod cleanup by adding
extra_settings: - setting: RECEPTOR_RELEASE_WORK value: "False"
to your AWX spec file (note only do this for debugging purposes!)
Are the pods in a Completed status? if you tail the logs of the job pod, do you see the zipfile contents?
for these jobs that are ending in Error -- are your automation job pods completing successfully? Are the pods in a Completed status? Yes - they end successfully - based on kubectl logs on automation_job i see full output and yes - they have completed status. if you tail the logs of the job pod, do you see the zipfile contents? Also yes, log ends with zipfile contents, job on machines also are done correctly.
@marianskrzypekk base on the error message in your screenshot (next time please copy and paste) it seems like the data stream was cut mid "line" causing a malform message
please go to /api/v2/job/<job_id>
and provide us with the result_traceback for further debugging
since this issue is open on Feb if this problem has resolved and/or unreproducible please close this issue