loki icon indicating copy to clipboard operation
loki copied to clipboard

Label discovery jobs in Promtail

Open bgdnlp opened this issue 3 years ago • 0 comments

Is your feature request related to a problem? Please describe. Trying to get EC2 labels added to logs.

For a log aggregation system that is built around labels it is weird to me that I can't add labels discovered automatically from different sources. For example, #2707 asks to be able to add EC2 tags to journal logs, which sounds reasonable.

Adding EC2 labels to a log is done in a roundabout way, where the local machine gets information about all the EC2 instances, then filters out everything but itself. As long as __host__ is set, which is unexpected. And then __path__ is not a list as far as I understand, which makes it weird to add files under different paths in the same service discovery.

Describe the solution you'd like I'd like a way to discover labels from different sources, separately from target discovery. Here's an example config file:

label_discovery:
  - job_name: ec2_metadata
    custom_script:
      - path: /my/custom/label/discovery/script
  - job_name: ec2_describe_instance
	ec2_sd_configs:
	  - region: us-east-2
	    access_key: REDACTED
        secret_key: REDACTED
        filters:
          - name: instance-id
            value: instance_id_returned_by_my_custom_script

scrape_configs:
  - job_name: journal
    journal:
      path: /var/log/journal
    label_discovery_jobs:
      - ec2_describe_instance
    relabel_configs:
      - source_labels: ['__meta_ec2_tag_OwnerTeam']
        target_label: owner_team
      - source_labels: ['__meta_ec2_private_dns_name']
        target_label: ec2_private_hostname
      - source_labels: ['__journal__hostname']
        target_label: local_hostname
  - job_name: log_files
    static_configs:
      - targets:
	      - localhost
	    labels:
	      __path__:
	        - /var/log/messages
	        - /var/log/nginx/*error*.log
	        - /my/app/logs/*
    label_discovery_jobs:
      - ec2_describe_instance
    relabel_configs:
      - source_labels: ['__meta_ec2_tag_OwnerTeam']
        target_label: owner_team
      - source_labels: ['__meta_ec2_private_dns_name']
        target_label: ec2_private_hostname

There's a new section, label_discovery. It could be a setting on each scrape job, but moving it outside allows for reusing the same discovery for multiple scrape configurations. Then again, there's an argument to be made for using the labels from a scrape config as a label discovery parameter.

Label discovery jobs are run in order. Labels generated by a job can be used by subsequent jobs. This is consistent with how relabeling works. And if the jobs don't run in sync (different refresh intervals) the later job simply uses the labels available, as long as the previous job ran once it did what it was supposed to.

The first label discovery job is a custom script. This could be file_sd_config instead with the "targets" key ignored, but a script might also make sense. Not sure about secrity implications. Anyway, in this case the script would extract the EC2 instance ID from EC2 instance metadata.

The second label discovery jobs calls the AWS API to get data about the machine it runs on. It uses the output of the previous job to only retrieve data about the current instance. That way setting __host__ isn't needed.

In the scrape_configssection there's a new setting per job, label_discovery_jobs. It lists all the label_discovery jobs that should be available to this particular scrape job. Should probably default to all discovered sets. Or it might not be an option, but restricting which sets are available could allow using multiple label discovery jobs that would generate identical labels.

The labels from target discovery and label discovery should be conflated into one set, with target discovery labels overwriting any label discovery labels with the same name. That way everything is available to relabel_configs and should also be available to pipelines.

The second scrape config job lists multiple values for __path__, but that's another feature request. I am aware that multiple files can be listed in the same path using {/one/path,/another/path}. Still.

Describe alternatives you've considered I guess something similar could be achieved with an external script that builds a file for file discovery with all the necessary data.

Additional context I'm not a (Go) programmer. I can't help with the implementation.

bgdnlp avatar Aug 05 '22 08:08 bgdnlp