opentelemetry-operator
opentelemetry-operator copied to clipboard
PodLogMonitor resource for better logging self service in kubernetes deployments
Component(s)
receiver/filelog
Is your feature request related to a problem? Please describe.
I manage the logging system for an Internal developer platform which serves many engineering teams which output a ton of logs. Currently we have a mix of stdout-scraping and direct-from-application logging. To use the stdout scraping, the engineering teams add a special label on the pods they want our logging system to scrape. This currently works by having a fluent-bit daemonset scrape the kubernetes log files, then enrich the log records with k8s information, and then discard the records depending on if they have the special label or not.
There are two problems with this approach:
-
Unneccesary usage of resources Since fluent bit is set to monitor all files in kubernetes' stdout directory using
/var/log/containers/*.log
, every single logline from every container on each node is passed into fluent bit, just for a portion of them to be discarded. This requires fluent bit to ping the kubernetes api to fetch information which ultimately isn't required. It might have a cache under the hood, but it still makes the system do a lot of unnecessary operations. This happens even for containers we or the external teams have no interest in capturing logs from. -
Bad log parsing Since we cant reconfigure fluent bit to include custom parsers for each application across the platform, we have a policy that we only parse the logs if they are in json format. If they are not, we simply ship them as unparsed strings to our logging backend. This leaves many logs unparsed, including all multiline logs.
We are looking into moving over to use OpenTelemetry Operator with collectors in daemonset mode, however this setup would currently have to work exactly like the system described above, by ingesting all loglines and then use the k8sattributesprocessor
to add the pod label information.
Describe the solution you'd like
A solution for this problem could be to take inspiration from Prometheus' PodMonitor resource. This uses a PodSelector
and a podMetricsEndpoints
section to tell Prometheus which pods expose metrics on which endpoints.
Something similar could be implemented for logging, for example by using a PodLogMonitor
resource. This could tell a special filelog receiver which containers to monitor, and how logs from each container should be parsed. It would probably require a special filelog receiver in the Collector daemonset, so that the system knows which one that should be affected, for example by introducing a setting like usePodLogMonitor
:
receivers:
filelog:
usePodLogMonitor: true
operators:
...
Pod selection could have an easy solution since files in the log directory already have a strict structure:
\var\log\containers\<pod>_<namespace>_<container>-<docker>.log
This means a PodLogMonitor
resource could follow a simple heuristic, by only looking at the pod name, namespace or container name. For example, the following PodLogMonitor:
apiVersion: opentelemetry.io/v1alpha1
kind: PodLogMonitor
metadata:
name: example-podlogmonitor
namespace: fullstackapp
labels:
team: webdevs
spec:
selectors:
- podName: backend
containerName: server
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
- podName: backend
containerName: istio-proxy
operators:
- type: json_parser
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%d %H:%M:%S'
- podName: frontend
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
Would result in the following filelog receiver config in the daemonset Collectors:
receivers:
filelog:
include: [ /var/log/containers/backend*_fullstackapp_server-*.log, /var/log/containers/backend*_fullstackapp_istio-proxy-*.log, /var/log/containers/frontend*_fullstackapp_*-*.log ]
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
if: 'attributes.log.file.name matches "^backend.*_fullstackapp_server-.*.log$"'
- type: json_parser
timestamp:
parse_from: attributes.time
layout: '%Y-%m-%d %H:%M:%S'
if: 'attributes.log.file.name matches "^backend.*_fullstackapp_istio-proxy-.*.log$"'
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
if: 'attributes.log.file.name matches "^frontend.*_fullstackapp_.*-.*.log$"'
There are some challenges here with regards to how a config should update the filelog reciever upon deployment of a PodLogMonitor. One possibility could be that the receiver itself is watching PodLogMonitor objects, and occasionally updates itself. However then the actual config would not be reflected in the OpenTelemetryCollector crd, and only update "under the hood", at least without custom logic to update the crd. This approach would probably be better as part of a separate receiver, for example filelogreceiverk8s
. The benefit here would be that it might be possible to implement it dynamically so that the whole config file doesn't need to be reloaded.
Another solution could be that the operator watches PodLogMonitor resources, and updates the special filelog reciever which has usePodLogMonitor: true
set in the OpenTelemetryCollector crd if it is in daemonset mode. It should then utilize the hot-restart feature to avoid pod redeploys.
This would allow platforms to provide an awesome self-service logging solution to their users, as they do not have to deal with credentials or sidecars. All they have to do is deploy a PodLogMonitor along with their application, and the Otel pipeline provided by the platform team could do the rest. Opentelemetry Collector is in a unique position to leverage its operator to have such functionality, this differentiates it from fluentbit/fluentd.
Describe alternatives you've considered
The PodLogMonitor could utilize the common podselector pattern, this would allow the receiver to also select pods based on labels. However this would require logic on the operators part to check the pods in the cluster, and more often update the collector configs as pods are scheduled. This probably requires the filelogreceiver to have full podnames in its include list as well.
apiVersion: opentelemetry.io/v1alpha1
kind: PodLogMonitor
metadata:
name: example-podlogmonitor-podselector
namespace: fullstackapp
labels:
team: webdevs
spec:
selectors:
- selector:
matchLabels:
app: example-app
operators:
- type: regex_parser
regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
Another alternative could be that a unique filelog receiver gets added to the config for each PodLogMonitor, or each entry in the PodLogMonitor.
Additional context
I have been trying to find an existing solution that provides a functionality like this, but have not been able to. Please let me know if this or something similar has been done in another application.
Everything I have written here is of course a suggestion. There is probably a better name and structure for the "PodLogMonitor" crd than what I have suggested here.
I would love to hear your input on both the concept and technical details.