awx icon indicating copy to clipboard operation
awx copied to clipboard

AWX stops gathering job output if kubernetes starts a new log

Open dbanttari opened this issue 2 years ago • 18 comments

Please confirm the following

  • [X] I agree to follow this project's code of conduct.
  • [X] I have checked the current issues for duplicates.
  • [X] I understand that AWX is open source software provided for free and that I am not entitled to status updates or other assurances.

Summary

If a very large job is run, sometimes Kubernetes will start a new log, but AWX's reporting of job progess stalls. Eventually the job ends as an Error, even though the job itself continues to completion in the pod

AWX version

19.4.0

Installation method

kubernetes

Modifications

yes

Ansible version

2.11.3.post0

Operating system

Azure Kubernetes Service 1.21.2

Web browser

Chrome

Steps to reproduce

We can tail the log of the pod in real time using kubectl

->k logs -f -n awx automation-job-29506-2qlhn
...
...
...
{"uuid": "021508bc-6424-4b63-a394-a42f0aaeab36", "counter": 42648, "stdout": "", "start_line": 23274, "end_line": 23274, "runner_ident": "29506", "event": "runner_on_start", "job_id": 29506, "pid": 19, "created": "2021-11-11T19:51:19.974410", "parent_uuid": "261db21c-7d3e-ca0c-65e3-000000042eae", "event_data": {"playbook": "site.yml", "playbook_uuid": "3cc1df7c-23bc-4641-8262-1eaf9a73ea2f", "play": "tag_role_edapp:tag_role_eda:tag_role_edweb:tag_role_edw:tag_role_rhb*:tag_role_rwb1*:tag_role_vhb*:tag_role_vtxa*:tag_role_vtxw*:tag_role_vweb:tag_role_vws1*:tag_role_vws2*:tag_role_vwsa:tag_role_wvd1", "play_uuid": "261db21c-7d3e-ca0c-65e3-00000000031f", "play_pattern": "tag_role_edapp:tag_role_eda:tag_role_edweb:tag_role_edw:tag_role_rhb*:tag_role_rwb1*:tag_role_vhb*:tag_role_vtxa*:tag_role_vtxw*:tag_role_vweb:tag_role_vws1*:tag_role_vws2*:tag_role_vwsa:tag_role_wvd1", "task": "Create component in udeploy", "task_uuid": "261db21c-7d3e-ca0c-65e3-000000042eae", "task_action": "uri", "task_args": "", "task_path": "/runner/project/roles/udeploy-agent/tasks/add_component.yml:26", "role": "udeploy-agent", "host": "**********", "uuid": "021508bc-6424-4b63-a394-a42f0aaeab36"}}

*** this window stalls until the job is complete and the pod exits ***

*** opened a new window and it resumes: ***

->k logs -f -n awx automation-job-29506-2qlhn
{"uuid": "7193d6f6-1f7e-48c6-8f4d-b764817d31cb", "counter": 56862, "stdout": "\u001b[0;32mok: [dvuc-rhb1b-vm0]\u001b[0m", "start_line": 32869, "end_line": 32870, "runner_ident": "29506", "event": "runner_on_ok", "job_id": 29506, "pid": 19, "created": "2021-11-11T20:03:21.402116", "parent_uuid": "261db21c-7d3e-ca0c-65e3-0000000505be", "event_data": {"playbook": "site.yml", "playbook_uuid": "3cc1df7c-23bc-4641-8262-1eaf9a73ea2f", "play": "tag_role_rhb*:tag_role_rwb1*:tag_role_vtxa*:tag_role_vtxw*:tag_role_vweb:tag_role_vws1*:tag_role_vws2*:tag_role_vwsa", "play_uuid": "261db21c-7d3e-ca0c-65e3-0000000003c3", "play_pattern": "tag_role_rhb*:tag_role_rwb1*:tag_role_vtxa*:tag_role_vtxw*:tag_role_vweb:tag_role_vws1*:tag_role_vws2*:tag_role_vwsa", "task": "Make renamed executable for easier troubleshooting in multi-instance installations", "task_uuid": "261db21c-7d3e-ca0c-65e3-0000000505be", "task_action": "ansible.windows.win_copy", "task_args": "", "task_path": "/runner/project/roles/tomcat/tasks/install-tomcat.yml:20", "role": "tomcat", "host": "***********", "remote_addr": "**********", "res": {"src": "**********", "dest": "**********", "checksum": null, "size": 131072, "original_basename": "tomcat9.exe", "changed": false, "operation": "file_copy", "_ansible_no_log": false}, "start": "2021-11-11T20:03:20.256071", "end": "2021-11-11T20:03:21.401776", "duration": 1.145705, "event_loop": null, "uuid": "7193d6f6-1f7e-48c6-8f4d-b764817d31cb"}}
...
...
...
{"status": "failed", "runner_ident": "29506"}
{"zipfile": 85344}
*** zipfile contents here ***
{"eof": true}

Note that this example is discontiguous, which implies that this may happen multiple times during a pod's life

Expected results

Expected AWX to continue reporting the progress of the job, and noting its proper completion status (in this case, "Failed")

Actual results

The last item presented by AWX was the last entry in the log before it started anew:

Screen Shot 2021-11-11 at 2 10 28 PM

Additional information

Ansible pod image is based on awx-ee:0.6.0 but adds things like helm, openssl, azure cli, zabbix-api, and other libraries that are used by various tasks

Installation was done via AWX Operator 0.14.0

dbanttari avatar Nov 11 '21 20:11 dbanttari

This seems similar/related? https://github.com/kubernetes/kubernetes/issues/28369

dbanttari avatar Nov 16 '21 21:11 dbanttari

We recently saw this in an AKS deployment. I've verified a workaround by setting kubelet --container-log-max-size to a higher number, or using containerLogMaxSize in a kubelet config file.

  • https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
  • https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

shanemcd avatar Nov 18 '21 09:11 shanemcd

https://github.com/ansible/awx/issues/10366#issuecomment-951852069 @dbanttari this might fix your issue. Note the below modification that is more generic and doesn't rely on the max-pods being set to 30. Can you try it and report back?

if ! $(grep q container-log-max-size /etc/default/kubelet); then sed -E -i 's/max-pods=([0-9]*)/max-pods=\1 --container-log-max-size=500Mi/g' /etc/default/kubelet; systemctl restart kubelet; fi

chrismeyersfsu avatar Nov 19 '21 18:11 chrismeyersfsu

Hello, We are in the same case with our AWX instance (same version). We are currently in a GKE cluster so we don't have control over the kubelet configuration. Do you plan to fix this soon?

Tioborto avatar Dec 10 '21 15:12 Tioborto

I'm not sure how we would fix this without completely rearchitecting things to not rely on the kubernetes logging system. We don't have any plans to do this right now.

shanemcd avatar Dec 14 '21 08:12 shanemcd

Hello @shanemcd , This is a real problem for us. We have users with jobs in "ERROR" status while we have full logs in our external tool. Why is this not a top priority? This is a regression from previous versions

Tioborto avatar Dec 20 '21 08:12 Tioborto

I'm not sure how we would fix this without completely rearchitecting things to not rely on the kubernetes logging system. We don't have any plans to do this right now.

Running this in kubernetes 1.20+ should be called as strictly unsupported and broken I feel like

bschaeffer avatar Jan 07 '22 19:01 bschaeffer

From #11511:

FWIW, this does seem to be related to container runtime. In GKE, we recently upgraded to 1.20 which defaults to containerd at the same time we saw this error. When we rolled back to 1.19, it was also broken, but we realised it was still using containerd://1.4.8.

When we switched the runtime to docker://19.3.15, it actually fixed our problem.

So things are now in a working state, but it will be EOL for us in about 5 months.

bschaeffer avatar Jan 18 '22 14:01 bschaeffer

I'm not sure how we would fix this without completely rearchitecting things to not rely on the kubernetes logging system. We don't have any plans to do this right now.

I agree this is a contanerd/k8s issue. But it also best practice to "limit' logs from filling up filesystems.

But relying on the logging systems of k8s does not seem reliable.

Are te following options valid? Move to openshift? Is it possible to move the execution nodes to podman (and does podman have this issue?)

MalfuncEddie avatar Jan 21 '22 08:01 MalfuncEddie

I'm not sure how we would fix this without completely rearchitecting things to not rely on the kubernetes logging system. We don't have any plans to do this right now.

If this https://github.com/kubernetes/kubernetes/issues/59902 was implemented, would not that help?

I can also confirm is it an issue for us too. For some problematic playbooks with lots of output and multiple hosts, the only solution we found, was to create a playbook that will create a pod which runs the playbook directly on the k8s cluster, so outside of AWX :-( In our case, increasing the log size is not an option, because we have other workloads on the cluster and increasing the log file size will drastically increase the disk space.

Thanks

oliverf1 avatar Feb 01 '22 05:02 oliverf1

Hi same problem in k3s, this is a real problem for us also

Rhel 8.4 Version: 0.1.0 RuntimeName: containerd RuntimeVersion: v1.5.8-k3s1 RuntimeApiVersion: v1alpha2 k3s 1.22

Fixed with kubelet --container-log-max-size

chris93111 avatar Feb 24 '22 12:02 chris93111

Using AKS and Terraform we were able to get enough headroom by specifying:

resource "azurerm_kubernetes_cluster_node_pool" "pool" { 
  ...

  kubelet_config = {
    container_log_max_line    = 5
    container_log_max_size_mb = 500
  }
  ...
}

The default, I believe, was 10MB.

jogleasonjr avatar Feb 24 '22 15:02 jogleasonjr

#10366 (comment) @dbanttari this might fix your issue. Note the below modification that is more generic and doesn't rely on the max-pods being set to 30. Can you try it and report back?

if ! $(grep q container-log-max-size /etc/default/kubelet); then sed -E -i 's/max-pods=([0-9]*)/max-pods=\1 --container-log-max-size=500Mi/g' /etc/default/kubelet; systemctl restart kubelet; fi

Hey, I tried your suggestion but included--container-log-max-files=3 before I read that 5 is the default value... And even though the daemonset pods failed to start with Crashloopback and I deleted them a couple of minutes later somehow the kubelet config file now looks like this:

KUBELET_FLAGS=--address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --azure-container-registry-config=/etc/kubernetes/
azure.json --cgroups-per-qos=true --client-ca-file=/etc/kubernetes/certs/ca.crt --cloud-config=/etc/kubernetes/azure.json --cloud-provider=azure --cluster-dns=10.0.0.0
--cluster-domain=cluster.local --enforce-node-allocatable=pods --event-qps=0 --eviction-hard=memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5% --feature-g
ates=RotateKubeletServerCertificate=true --image-gc-high-threshold=85 --image-gc-low-threshold=80 --image-pull-progress-deadline=30m --keep-terminated-pod-volumes=false -
-kube-reserved=cpu=100m,memory=1843Mi --kubeconfig=/var/lib/kubelet/kubeconfig --max-pods=35 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-ma
x-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container
-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --co
ntainer-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files
=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-ma
x-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container
-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --co
ntainer-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500
Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-s
ize=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-lo
g-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --conta
iner-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3
--container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-f
iles=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-lo
g-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --conta
iner-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi
--container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size
=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-m
ax-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --containe
r-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --c
ontainer-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-file
s=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-m
ax-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --containe
r-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --c
ontainer-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=50
0Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-
size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-l
og-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --cont
ainer-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3
--container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-
files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-l
og-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --cont
ainer-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi
--container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --container-log-max-size=500Mi --container-log-max-files=3 --network-plugin=cni --
node-status-update-frequency=10s --non-masquerade-cidr=0.0.0.0/0 --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.5 --pod-manifest-path=/etc/kubernete
s/manifests --pod-max-pids=-1 --protect-kernel-defaults=true --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --rotate-certificates=false --streaming-con
nection-idle-timeout=4h --tls-cert-file=/etc/kubernetes/certs/kubeletserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM
_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_
WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-private-key-file=/etc/kubernetes/certs/kubeletserver.key
KUBELET_REGISTER_SCHEDULABLE=true
NETWORK_POLICY=azure

KUBELET_NODE_LABELS=kubernetes.azure.com/role=agent,agentpool=nodepoold,kubernetes.azure.com/agentpool=nodepoold,storageprofile=managed,storagetier=Premium_LRS,kube
rnetes.azure.com/storageprofile=managed,kubernetes.azure.com/storagetier=Premium_LRS,kubernetes.azure.com/os-sku=Ubuntu,kubernetes.azure.com/cluster=MC_rg_cluster_westeurope
,kubernetes.azure.com/mode=system,kubernetes.azure.com/node-image-version=AKSUbuntu-1804gen2containerd-2021.09.28,nodepool=defaultnodepoolnew

root@aks-nodepoolD:/etc/default#

Funny enough everything works exactly as before and as far as I can tell, nothing was removed, only the two parameters got added a gazzilion times... Anyway I would like to fix this mess and restore the default kubelet config file if possible.

Could anyone suggest how to fix this please?

def1nity avatar Feb 25 '22 16:02 def1nity

Lucky for us, I've also setup ARA. so when the log file rotates, ARA still has the complete job data..

ThomasADavis avatar Mar 01 '22 20:03 ThomasADavis

Any update about the resolution of this problem @shanemcd ? please

Tioborto avatar Mar 22 '22 14:03 Tioborto

Also see some good quality research in the ansible-runner issue https://github.com/ansible/ansible-runner/issues/998

This is particularly a problem for inventory updates that produce large inventories, because the archive of the artifacts necessarily has to be large, and the way that is sent requires precise byte alignment or else it won't work.

Trying to think of where / how a patch might be developed... a change to awx-operator might just increase the size that it can handle but not fundamentally address the issue. If I could reproduce it, maybe I could get the syntax of what the controller node reads, and figure out how to identify these breaks in the ansible-runner code.

AlanCoding avatar Apr 08 '22 14:04 AlanCoding

For me the same behavior happens when automation-job pod finished and is destroyed. In pod logs i could see that all happened correctly but AWX jobs have Error with trimmed log output.

cerberek avatar Jun 15 '22 14:06 cerberek

@AlanCoding to force a reproducer, you could try setting kubelet --container-log-max-size to a very small number

kdelee avatar Jul 27 '22 18:07 kdelee

Any update about the resolution?

ethemcemozkan avatar Sep 22 '22 06:09 ethemcemozkan

Hi same problem in k3s, this is a real problem for us also

Rhel 8.4 Version: 0.1.0 RuntimeName: containerd RuntimeVersion: v1.5.8-k3s1 RuntimeApiVersion: v1alpha2 k3s 1.22

Fixed with kubelet --container-log-max-size

Hi - trying to apply the mentioned workaround for k3s, too, but currently I'm unsure on where and how to apply the container-log-max-size configuration. Could you please give me a hint, where and how you deployed that?

Thanks,

Andreas

andreasbourges avatar Sep 27 '22 06:09 andreasbourges

@andreasbourges Modify /etc/systemd/system/k3s.service and restart K3s,

$ cat /etc/systemd/system/k3s.service
...
ExecStart=/usr/local/bin/k3s \
    server \
        '--write-kubeconfig-mode' \
        '644' \
        '--kubelet-arg' \     👈👈👈
        'container-log-max-files=4' \     👈👈👈
        '--kubelet-arg' \     👈👈👈
        'container-log-max-size=50Mi' \     👈👈👈

$ sudo systemctl daemon-reload
$ sudo systemctl restart k3s

Or re-install K3s using script with specific arguments,

$ curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644 --kubelet-arg "container-log-max-files=4" --kubelet-arg "container-log-max-size=50Mi"

kurokobo avatar Sep 27 '22 06:09 kurokobo

@andreasbourges Modify /etc/systemd/system/k3s.service and restart K3s,

$ cat /etc/systemd/system/k3s.service
...
ExecStart=/usr/local/bin/k3s \
    server \
        '--write-kubeconfig-mode' \
        '644' \
        '--kubelet-arg' \     👈👈👈
        'container-log-max-files=4' \     👈👈👈
        '--kubelet-arg' \     👈👈👈
        'container-log-max-size=50Mi' \     👈👈👈

$ sudo systemctl daemon-reload
$ sudo systemctl restart k3s

Or re-install K3s using script with specific arguments,

$ curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644 --kubelet-arg "container-log-max-files=4" --kubelet-arg "container-log-max-size=50Mi"

Thanks a lot - this saves me a lot of time!

Highly appreciated!

andreasbourges avatar Sep 27 '22 06:09 andreasbourges

Hi! I have been reading all your valuable information and tests for the last days, because we are facing the same issue. I just wanted to let you know that, in our case, changing the log configuration didn't solve it, so I'm afraid we have come to the wall of "4h maximum connection time" that is commented in this other issue:

https://github.com/ansible/awx/issues/11805#issuecomment-1249169747

anasaizg avatar Sep 27 '22 08:09 anasaizg

Hi,

thanks for sharing your experience - I adjusted the amount of data processed in the playbook (way too much information was gathered via an http request to Autobot) and I adjusted the container-log size.

What I can tell is, that opening the output log from the aborted tasks, still triggers the uwsgi consuming GBs of memory. We're hoping that adjusting memory consumption will prevent the tasks from failing and thus avoid the uwsgi problem.

Well, let's see.

Thanks,

Andreas

Edit by Gundalow to tidy up formatting

andreasbourges avatar Sep 27 '22 08:09 andreasbourges

Just want to say that increasing size before log rotation does not solve the problem. It just decreases the likelyhood that your run will take place or trigger a log rotation that causes the error. Its not a solution to this problem in any way.

bschaeffer avatar Oct 12 '22 14:10 bschaeffer

Hi Braden,

.I totally agree - this is a non-deterministic workaround, but in combination with the reduction of the generated logs we're running fine so far - not a single error since the change (before we had a chance of 50:50). And in the absence of any alternatives, we are happy to have this one.

Do you know of any other ways to solve this?

Thanks,

Andreas

Von: Braden Schaeffer @.> Gesendet: Mittwoch, 12. Oktober 2022 16:20 An: ansible/awx @.> Cc: andreasbourges @.>; Mention @.> Betreff: Re: [ansible/awx] AWX stops gathering job output if kubernetes starts a new log (Issue #11338)

Just want to say that increasing size before log rotation does not solve the problem. It just decreases the likelyhood that your run will take place or trigger a log rotation that causes the error. Its not a solution to this problem in any way.

Reply to this email directly, view it on GitHub https://github.com/ansible/awx/issues/11338#issuecomment-1276260824 , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIB4YVQ5FYXYS7JVDRJY5RDWC 3CJRANCNFSM5H3JFP5A> . You are receiving this because you were mentioned.Message ID: @.***>

andreasbourges avatar Oct 12 '22 15:10 andreasbourges

fixed in https://github.com/ansible/receptor/pull/683/

TheRealHaoLiu avatar Dec 14 '22 16:12 TheRealHaoLiu

fixed in ansible/receptor#683

Not for me — I seem to still get only the first chunk of logs. Running AWX receptor version 1.4.1 against OpenShift 3.11's (I know, I know) Kubernetes 1.11.0+d4cacc0.

It is worth noting that the /tmp/receptor/awx-0/*/stdout files do not end with (or contain) an { "eof": true } marker in this case, indicating that Receptor at least understands that it has work remaining.

I'm not sure how we would fix this without completely rearchitecting things to not rely on the kubernetes logging system.

Indeed. I seem to remember that at some point, AWX used to rely on RabbitMQ for this task?

domq avatar Jul 05 '23 10:07 domq

this issue seems finally fixed for me with kubernetes 1.29

Klaas- avatar Jan 09 '24 13:01 Klaas-