agent agent: promtail causing very high cpu load when running and stops

The high CPU load starts suddenly and disappears after restarting agent.

1655961524997

Jun 23 '22 05:06 chenjpu

Hey there 👋 Thanks for the report!

I read through the linked Loki issue, are you encountering the same problems? That is, high CPU usage when a workflow stops, or does it happen at a random moment?

We currently haven't ported the commit that closed the linked issue, but I think it would be easy to do so and see if it makes any difference.

Also, since it seems you're comfortable with pprof, would you mind generating a 29-second CPU profile when the issue appears and uploading the .gz files so we could take a look ourselves as well?

Jun 24 '22 09:06 tpaschalis

The phenomenon found at present is that when the service starts and stops, but also the occasional phenomenon.

profile.gz

NOTE: During the troubleshooting process, the possible reason is that two different services listen to the same file path(the file path does not exist on the server) due to incorrect configuration

Jun 25 '22 00:06 chenjpu

This problem is reproduced locally profile.gz

ENV：nomad + consul

config:

     - job_name: consul
       consulagent_sd_configs:
       - server: '127.0.0.1:8500'
       relabel_configs:
         - source_labels: [__meta_consulagent_tags]
           regex: .*,(loki|dapr),.*
           action: keep
         #1- source_labels: [__meta_consulagent_service_metadata_ALLOC_ID]
         #2  regex:  \w{8}-\w{4}-\w{4}-\w{4}-\w{12}
         #3  action: keep
         - source_labels: [__meta_consulagent_service]
           target_label: service
         - source_labels: [__meta_consulagent_node]
           target_label: node     
         - source_labels: [__meta_consulagent_service_metadata_env]
           target_label:  env
         - source_labels: [__meta_consulagent_service_metadata_app_code]
           target_label:  app_code
         - source_labels: [__meta_consulagent_service_metadata_ALLOC_ID]
           target_label:  __path__
           replacement:   '/opt/nomad/data/alloc/${1}/alloc/logs/[a-zA-Z]*'
       pipeline_stages:
          - labeldrop:
            - filename

setp 1.Turn off the 1,2,3 configuration 2.Scroll to upgrade a service with a missing alloc_id meta（Both the old and new registration services are missing the alloc_id meta configuration） 3.The upgrade succeeded, and the old service was stopped

Jun 27 '22 00:06 chenjpu

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!

Aug 21 '22 00:08 github-actions[bot]

agent agent copied to clipboard

agent: promtail causing very high cpu load when running and stops

agent
agent copied to clipboard