compliantkubernetes-apps
compliantkubernetes-apps copied to clipboard
[3] flunetd: stuck ingesting ingress-nginx logs
Describe the bug fluentd is using a lot of CPU (up to 2) on ingress-nginx logs, but it doesn't manage to process them. It also keeps trying to ingest this logs, I found logs older than 2 weeks.
To Reproduce The fluentd logs do not show an error, the pods just fail under the CPU throttling, but here I noticed the following patterns to be an issue:
broken header: \"\u0016\u0003\u0001\u0000{\u0001\u0000\u0000w\u0003\u0003\ufffdL\u000ea\u001d\ufffd\ufffd\u0014\ufffd~\ufffd\u000f\ufffd0\u0011\ufffd^?\ufffd\ru\u0000DԸ(\ufffd\ufffd\ufffdaL\ufffdK\u0000\u0000\u001a\ufffd/\ufffd+\ufffd\u0011\ufffd\u0007\ufffd\u0013\ufffd\u0009\ufffd\u0014\ufffd\n"
There are also this kind o error that increase fluentd resource usage:
2022-12-01 07:53:57.582275069 +0000 fluent.warn: {"error":"#<Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError: 400 - Rejected by OpenSearch [error type]: mapper_parsing_exception [reason]: 'failed to parse field [message] of type [text] in document with id 'nlKtzIQB6EBhZuhKkDJD'. Preview of field's value: '''>","location":null,"tag":"kubernetes.var.log.containers.ingress-nginx-controller-db9g7_ingress-nginx_controller-bbf47a8d00a202fd6c6cedf5bb09b95ee7064cdc76767e76a99d3899293ebf5d.log"
level":"warn","message":"dump an error event: error_class=Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError error=\"400 - Rejected by OpenSearch [error type]: mapper_parsing_exception [reason]: 'failed to parse field [message] of type [text] in document with id 'yFWJzIQB2Da7ht6uxOv3'. Preview of field's value: '''\" location=nil tag=\"kubernetes.var.log.containers.ingress-nginx-controller-s7g2h_ingress-nginx_controller-ba8354ebc0f8978d622cdce4ff0810bf5de58825071a938c7b07803bb0109472.log\"
Expected behaviour fluentd is able the ingest the logs or drops them
Version (add all relevant versions):
- Compliant kubernetes apps version v0.26.3
Definition of Done
- ingress-nginx logs are being ingested by fluentd
Timebox this to 3 days. If it takes more time than that, contact a scrum master.