agent
agent copied to clipboard
agent: promtail causing very high cpu load when running and stops
Hey there 👋 Thanks for the report!
I read through the linked Loki issue, are you encountering the same problems? That is, high CPU usage when a workflow stops, or does it happen at a random moment?
We currently haven't ported the commit that closed the linked issue, but I think it would be easy to do so and see if it makes any difference.
Also, since it seems you're comfortable with pprof, would you mind generating a 29-second CPU profile when the issue appears and uploading the .gz
files so we could take a look ourselves as well?
The phenomenon found at present is that when the service starts and stops, but also the occasional phenomenon.
NOTE: During the troubleshooting process, the possible reason is that two different services listen to the same file path(the file path does not exist on the server) due to incorrect configuration
This problem is reproduced locally profile.gz
ENV:nomad + consul
config:
- job_name: consul
consulagent_sd_configs:
- server: '127.0.0.1:8500'
relabel_configs:
- source_labels: [__meta_consulagent_tags]
regex: .*,(loki|dapr),.*
action: keep
#1- source_labels: [__meta_consulagent_service_metadata_ALLOC_ID]
#2 regex: \w{8}-\w{4}-\w{4}-\w{4}-\w{12}
#3 action: keep
- source_labels: [__meta_consulagent_service]
target_label: service
- source_labels: [__meta_consulagent_node]
target_label: node
- source_labels: [__meta_consulagent_service_metadata_env]
target_label: env
- source_labels: [__meta_consulagent_service_metadata_app_code]
target_label: app_code
- source_labels: [__meta_consulagent_service_metadata_ALLOC_ID]
target_label: __path__
replacement: '/opt/nomad/data/alloc/${1}/alloc/logs/[a-zA-Z]*'
pipeline_stages:
- labeldrop:
- filename
setp 1.Turn off the 1,2,3 configuration 2.Scroll to upgrade a service with a missing alloc_id meta(Both the old and new registration services are missing the alloc_id meta configuration) 3.The upgrade succeeded, and the old service was stopped
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!