awx icon indicating copy to clipboard operation
awx copied to clipboard

K8S json parse error

Open marianskrzypekk opened this issue 1 year ago • 4 comments

Please confirm the following

  • [X] I agree to follow this project's code of conduct.
  • [X] I have checked the current issues for duplicates.
  • [X] I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • [X] I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

image Error started to happen around 3 days ago, i tested it on: AWX 23.7.0 & 23.8.0 & 23.8.1

  ee_extra_env: |
    - name: RECEPTOR_KUBE_SUPPORT_RECONNECT
      value: disabled

Also doesn't help. Based on other similar bug reports i also tested watchers limit and max file limit. Problem happen only on k8s hosts, after adding receptor_kube_support_reconnect on disabled - job finish sucessfully, but error on web still persist. Based on some comment i also tested out awx-ee in most available versions, unfortunately without success. Automation_job pod logs are large, that might be a problem so next step i tried out was to increase container-log-max-size but it also didn't helped. Verbosity level on job also dosen't change anything.

AWX version

AWX 23.7.0 & 23.8.0 & 23.8.1 (tried on all 3)

Select the relevant components

  • [X] UI
  • [ ] UI (tech preview)
  • [ ] API
  • [ ] Docs
  • [ ] Collection
  • [ ] CLI
  • [ ] Other

Installation method

kubernetes

Modifications

no

Ansible version

on host ansible 2.9.6

Operating system

Debian

Web browser

Firefox, Chrome, Safari, Edge

Steps to reproduce

Run playbook on some large k8s cluster(i tried out 3 different k8s clusters)

Expected results

Job finished sucessfully also in web ui.

Actual results

Job finish but error on web persist - like on screen above.

Additional information

No response

marianskrzypekk avatar Feb 16 '24 15:02 marianskrzypekk

Experiencing same issue, but with inventory sync -

kubectl get nodes NAME STATUS ROLES AGE VERSION Node1 Ready control-plane 543d v1.25.0 Node2 Ready 543d v1.25.0 Node3 Ready 543d v1.25.0 Node4 Ready 543d v1.25.0

kubelet --version [root@astdc-k8sawx01p ~]# kubelet --version Kubernetes v1.25.0

AWX UI Error's out

AdityaVishwekar avatar Feb 19 '24 07:02 AdityaVishwekar

for these jobs that are ending in Error -- are your automation job pods completing successfully? you can disable pod cleanup by adding

  extra_settings:
  - setting: RECEPTOR_RELEASE_WORK
    value: "False"

to your AWX spec file (note only do this for debugging purposes!)

Are the pods in a Completed status? if you tail the logs of the job pod, do you see the zipfile contents?

fosterseth avatar Feb 21 '24 18:02 fosterseth

for these jobs that are ending in Error -- are your automation job pods completing successfully? you can disable pod cleanup by adding

  extra_settings:
  - setting: RECEPTOR_RELEASE_WORK
    value: "False"

to your AWX spec file (note only do this for debugging purposes!)

Are the pods in a Completed status? if you tail the logs of the job pod, do you see the zipfile contents?

for these jobs that are ending in Error -- are your automation job pods completing successfully? Are the pods in a Completed status? Yes - they end successfully - based on kubectl logs on automation_job i see full output and yes - they have completed status. if you tail the logs of the job pod, do you see the zipfile contents? Also yes, log ends with zipfile contents, job on machines also are done correctly.

marianskrzypekk avatar Feb 22 '24 14:02 marianskrzypekk

@marianskrzypekk base on the error message in your screenshot (next time please copy and paste) it seems like the data stream was cut mid "line" causing a malform message

please go to /api/v2/job/<job_id> and provide us with the result_traceback for further debugging

since this issue is open on Feb if this problem has resolved and/or unreproducible please close this issue

TheRealHaoLiu avatar May 29 '24 15:05 TheRealHaoLiu