opentelemetry-operator PodLogMonitor resource for better logging self service in kubernetes deployments

Component(s)

receiver/filelog

Is your feature request related to a problem? Please describe.

I manage the logging system for an Internal developer platform which serves many engineering teams which output a ton of logs. Currently we have a mix of stdout-scraping and direct-from-application logging. To use the stdout scraping, the engineering teams add a special label on the pods they want our logging system to scrape. This currently works by having a fluent-bit daemonset scrape the kubernetes log files, then enrich the log records with k8s information, and then discard the records depending on if they have the special label or not.

There are two problems with this approach:

Unneccesary usage of resources Since fluent bit is set to monitor all files in kubernetes' stdout directory using /var/log/containers/*.log, every single logline from every container on each node is passed into fluent bit, just for a portion of them to be discarded. This requires fluent bit to ping the kubernetes api to fetch information which ultimately isn't required. It might have a cache under the hood, but it still makes the system do a lot of unnecessary operations. This happens even for containers we or the external teams have no interest in capturing logs from.
Bad log parsing Since we cant reconfigure fluent bit to include custom parsers for each application across the platform, we have a policy that we only parse the logs if they are in json format. If they are not, we simply ship them as unparsed strings to our logging backend. This leaves many logs unparsed, including all multiline logs.

We are looking into moving over to use OpenTelemetry Operator with collectors in daemonset mode, however this setup would currently have to work exactly like the system described above, by ingesting all loglines and then use the k8sattributesprocessor to add the pod label information.

Describe the solution you'd like

A solution for this problem could be to take inspiration from Prometheus' PodMonitor resource. This uses a PodSelector and a podMetricsEndpoints section to tell Prometheus which pods expose metrics on which endpoints.

Something similar could be implemented for logging, for example by using a PodLogMonitor resource. This could tell a special filelog receiver which containers to monitor, and how logs from each container should be parsed. It would probably require a special filelog receiver in the Collector daemonset, so that the system knows which one that should be affected, for example by introducing a setting like usePodLogMonitor:

receivers:
  filelog:
    usePodLogMonitor: true
    operators:
    ...

Pod selection could have an easy solution since files in the log directory already have a strict structure:

\var\log\containers\<pod>_<namespace>_<container>-<docker>.log

This means a PodLogMonitor resource could follow a simple heuristic, by only looking at the pod name, namespace or container name. For example, the following PodLogMonitor:

apiVersion: opentelemetry.io/v1alpha1
kind: PodLogMonitor
metadata:
  name: example-podlogmonitor
  namespace: fullstackapp
  labels:
    team: webdevs
spec:
  selectors:
  - podName: backend
    containerName: server
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
  - podName: backend
    containerName: istio-proxy
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'
  - podName: frontend
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'

Would result in the following filelog receiver config in the daemonset Collectors:

receivers:
  filelog:
    include: [ /var/log/containers/backend*_fullstackapp_server-*.log, /var/log/containers/backend*_fullstackapp_istio-proxy-*.log, /var/log/containers/frontend*_fullstackapp_*-*.log ]
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
        if: 'attributes.log.file.name matches "^backend.*_fullstackapp_server-.*.log$"'
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'
        if: 'attributes.log.file.name matches "^backend.*_fullstackapp_istio-proxy-.*.log$"'
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
        if: 'attributes.log.file.name matches "^frontend.*_fullstackapp_.*-.*.log$"'

There are some challenges here with regards to how a config should update the filelog reciever upon deployment of a PodLogMonitor. One possibility could be that the receiver itself is watching PodLogMonitor objects, and occasionally updates itself. However then the actual config would not be reflected in the OpenTelemetryCollector crd, and only update "under the hood", at least without custom logic to update the crd. This approach would probably be better as part of a separate receiver, for example filelogreceiverk8s. The benefit here would be that it might be possible to implement it dynamically so that the whole config file doesn't need to be reloaded.

Another solution could be that the operator watches PodLogMonitor resources, and updates the special filelog reciever which has usePodLogMonitor: true set in the OpenTelemetryCollector crd if it is in daemonset mode. It should then utilize the hot-restart feature to avoid pod redeploys.

This would allow platforms to provide an awesome self-service logging solution to their users, as they do not have to deal with credentials or sidecars. All they have to do is deploy a PodLogMonitor along with their application, and the Otel pipeline provided by the platform team could do the rest. Opentelemetry Collector is in a unique position to leverage its operator to have such functionality, this differentiates it from fluentbit/fluentd.

Describe alternatives you've considered

The PodLogMonitor could utilize the common podselector pattern, this would allow the receiver to also select pods based on labels. However this would require logic on the operators part to check the pods in the cluster, and more often update the collector configs as pods are scheduled. This probably requires the filelogreceiver to have full podnames in its include list as well.

apiVersion: opentelemetry.io/v1alpha1
kind: PodLogMonitor
metadata:
  name: example-podlogmonitor-podselector
  namespace: fullstackapp
  labels:
    team: webdevs
spec:
  selectors:
  - selector:
      matchLabels:
        app: example-app
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'

Another alternative could be that a unique filelog receiver gets added to the config for each PodLogMonitor, or each entry in the PodLogMonitor.

Additional context

I have been trying to find an existing solution that provides a functionality like this, but have not been able to. Please let me know if this or something similar has been done in another application.

Everything I have written here is of course a suggestion. There is probably a better name and structure for the "PodLogMonitor" crd than what I have suggested here.

I would love to hear your input on both the concept and technical details.

Mar 08 '24 17:03 TorsteinOtterlei

Pinging code owners:

receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Mar 08 '24 17:03 github-actions[bot]

I 💯 agree that this is an important problem to solve. I believe it belongs more in the OpenTelemetry Operator repository, I think this issue should be transferred there.

A product that comes to my mind and tries to solve at least a similar problem (but probably not exactly this one of selectively scraping only specific pods' logs) is the Telemetry Collector from Axoflow, cc @OverOrion.

Apr 11 '24 19:04 andrzej-stencel

Hey @TorsteinOtterlei,

I agree with @astencel-sumo that this might be better in the OpenTelemetry Operator repository.

There are some ongoing works regarding better integration for the k8s log collection (e.g., https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959, https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/24439), they will provide better building blocks.

The Telemetry Controller supports this label based use case , I prepared a short gist, please have a look at it: https://gist.github.com/OverOrion/5f951afc8d3ce5bf4fb4adf198b23c70

(There is also a blog post for using Loki with TC: https://axoflow.com/send-kubernetes-logs-to-loki-with-telemetry-controller/ if that helps)

At the moment the Telemetry Controller does the same as you have described, the "only collect what's needed" is a planned feature (stay tuned!).

By default it scrapes all the logs on the node, however by leveraging the so-called "Subscription" CR you can add some filtering using OTTL expressions (i.e., route() where IsMatch(attributes["k8s.pod.labels.my-special-label"], "my-marker")).

If this is something you are interested in, then message us on the CNCF slack #logging-operator channel.

Bad log parsing Since we cant reconfigure fluent bit to include custom parsers for each application across the platform, we have a policy that we only parse the logs if they are in json format. If they are not, we simply ship them as unparsed strings to our logging backend. This leaves many logs unparsed, including all multiline logs.

This is something that should/could be solved with an aggregator. I would suggest a different OpenTelemetry Collector instance, which gets the logs from the others and applies the necessary routing or transformation(s).

Apr 12 '24 13:04 OverOrion

As @OverOrion noted, there is already work underway in the collector itself, which may provide some of the building blocks necessary for the operator, but I agree with @astencel-sumo that the overall issue should be driven from the operator. (I don't have permissions to move it there but maybe @TylerHelmuth could help if he agrees?)

I'll also note that I am working on a proposal to modularize the fileconsumer package (which implements most of filelog receiver's functionality) in a way which would allow for custom implementations of certain aspects of its behavior. e.g. triggers, finding/filtering, tracking, etc. I believe this will provide a mechanism for k8s specific use cases mentioned in this issue.

Apr 15 '24 14:04 djaglowski

opentelemetry-operator opentelemetry-operator copied to clipboard

PodLogMonitor resource for better logging self service in kubernetes deployments

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

opentelemetry-operator
opentelemetry-operator copied to clipboard