dd-agent
dd-agent copied to clipboard
Agent proxy error: Flush ## is taking more than 10s, stopping it
Problem
I'm using the agent as a proxy as described here. I have around 15 hosts all using one server as the "proxy", which relays to the datadog public url.
Tailing the logs and restarting the datadog-agent service I find this:
$ tail -f /var/log/datadog/forwarder.log
2017-08-14 13:22:40 BST | WARNING | dd.forwarder | transaction(transaction.py:138) | Removed transaction 3210486 from queue
2017-08-14 13:22:40 BST | WARNING | dd.forwarder | transaction(transaction.py:138) | Removed transaction 3210485 from queue
2017-08-14 13:22:40 BST | WARNING | dd.forwarder | transaction(transaction.py:138) | Removed transaction 3210484 from queue
2017-08-14 13:22:40 BST | WARNING | dd.forwarder | transaction(transaction.py:138) | Removed transaction 3210483 from queue
2017-08-14 13:22:40 BST | INFO | dd.forwarder | forwarder(ddagent.py:593) | caught sigterm. stopping
2017-08-14 13:22:40 BST | INFO | dd.forwarder | forwarder(ddagent.py:575) | Stopped
2017-08-14 13:22:40 BST | INFO | dd.forwarder | forwarder(ddagent.py:593) | caught sigterm. stopping
2017-08-14 13:22:46 BST | INFO | dd.forwarder | root(ddagent.py:146) | Done with custom emitters
2017-08-14 13:22:46 BST | INFO | dd.forwarder | forwarder(ddagent.py:542) | Listening on port 17123
2017-08-14 13:22:46 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 1 transaction during flush #1
2017-08-14 13:22:47 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 3 transactions during flush #2
2017-08-14 13:22:49 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 7 transactions during flush #3
2017-08-14 13:22:53 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 12 transactions during flush #4
2017-08-14 13:22:59 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 17 transactions during flush #5
2017-08-14 13:22:59 BST | INFO | dd.forwarder | transaction(transaction.py:195) | First flushes done, next flushes will be logged every 20 flushes.
2017-08-14 13:23:18 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 6 is taking more than 10s, stopping it
2017-08-14 13:23:28 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 7 is taking more than 10s, stopping it
2017-08-14 13:23:39 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 8 is taking more than 10s, stopping it
2017-08-14 13:23:49 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 9 is taking more than 10s, stopping it
2017-08-14 13:24:00 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 10 is taking more than 10s, stopping it
2017-08-14 13:24:10 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 11 is taking more than 10s, stopping it
2017-08-14 13:24:20 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 12 is taking more than 10s, stopping it
2017-08-14 13:24:31 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 13 is taking more than 10s, stopping it
2017-08-14 13:24:41 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 14 is taking more than 10s, stopping it
2017-08-14 13:24:52 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 15 is taking more than 10s, stopping it
2017-08-14 13:25:02 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 16 is taking more than 10s, stopping it
2017-08-14 13:25:13 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 17 is taking more than 10s, stopping it
2017-08-14 13:25:23 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 18 is taking more than 10s, stopping it
2017-08-14 13:25:34 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 19 is taking more than 10s, stopping it
2017-08-14 13:25:34 BST | INFO | dd.forwarder | transaction(transaction.py:177) | Flushing 345 transactions during flush #20
2017-08-14 13:25:45 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 20 is taking more than 10s, stopping it
2017-08-14 13:25:55 BST | WARNING | dd.forwarder | transaction(transaction.py:212) | Flush 21 is taking more than 10s, stopping it
This goes on like this. I still see the data in the graphs on the datadog web interface though, although I'm interested in a process check on one of the servers which doesn't appear in the web, I'm wondering is it getting lost because the proxy doesn't pass it on.
Output of the info page
$ sudo service datadog-agent info
2017-08-14 13:32:08,134 | WARNING | dd.collector | utils.dockerutil(dockerutil.py:94) | Error while detecting orchestrator: Error while fetching server API version: ('Connection aborted.', error(2, 'No such file or directory'))
====================
Collector (v 5.16.0)
====================
Status date: 2017-08-14 13:31:52 (15s ago)
Pid: 6878
Platform: Linux-4.4.0-83-generic-x86_64-with-Ubuntu-16.04-xenial
Python Version: 2.7.13, 64bit
Logs: <stderr>, /var/log/datadog/collector.log, syslog:/dev/log
Clocks
======
NTP offset: 52.3057 s
System UTC time: 2017-08-14 12:32:08.188551
Paths
=====
conf.d: /etc/dd-agent/conf.d
checks.d: /opt/datadog-agent/agent/checks.d
Hostnames
=========
socket-hostname: <redacted>
hostname: <redacted>
socket-fqdn: <redacted>
Checks
======
network (5.16.0)
----------------
- instance #0 [OK]
- Collected 19 metrics, 0 events & 0 service checks
process (5.16.0)
----------------
- instance #0 [OK]
- instance #1 [OK]
- instance #2 [WARNING]
Warning: No matching process was found
- Collected 33 metrics, 0 events & 3 service checks
ntp (5.16.0)
------------
- Collected 0 metrics, 0 events & 0 service checks
disk (5.16.0)
-------------
- instance #0 [OK]
- Collected 32 metrics, 0 events & 0 service checks
http_check (5.16.0)
-------------------
- instance #0 [OK]
- instance #1 [OK]
- instance #2 [OK]
- instance #3 [OK]
- instance #4 [OK]
- instance #5 [OK]
- instance #6 [OK]
- instance #7 [OK]
- instance #8 [OK]
- Collected 9 metrics, 0 events & 9 service checks
Emitters
========
- http_emitter [OK]
====================
Dogstatsd (v 5.16.0)
====================
Status date: 2017-08-14 13:32:06 (1s ago)
Pid: 6875
Platform: Linux-4.4.0-83-generic-x86_64-with-Ubuntu-16.04-xenial
Python Version: 2.7.13, 64bit
Logs: <stderr>, /var/log/datadog/dogstatsd.log, syslog:/dev/log
Flush count: 56
Packet Count: 952
Packets per second: 1.7
Metric count: 9
Event count: 0
Service check count: 0
====================
Forwarder (v 5.16.0)
====================
Status date: 2017-08-14 13:32:05 (3s ago)
Pid: 6874
Platform: Linux-4.4.0-83-generic-x86_64-with-Ubuntu-16.04-xenial
Python Version: 2.7.13, 64bit
Logs: <stderr>, /var/log/datadog/forwarder.log, syslog:/dev/log
Queue Size: 2340983 bytes
Queue Length: 1166
Flush Count: 57
Transactions received: 2158
Transactions flushed: 992
Transactions rejected: 0
API Key Status: API Key is valid
======================
Trace Agent (v 5.16.0)
======================
Pid: 6873
Uptime: 566 seconds
Mem alloc: 2710240 bytes
Hostname: <redacted>
Receiver: 0.0.0.0:8126
API Endpoint: https://trace.agent.datadoghq.com
Bytes received (1 min): 0
Traces received (1 min): 0
Spans received (1 min): 0
Bytes sent (1 min): 0
Traces sent (1 min): 0
Stats sent (1 min): 0
Additional environment details (Operating System, Cloud provider, etc):
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
Steps to reproduce the issue:
- Configure an agent as a proxy as described here.
- Configure lots of hosts (around 15 in my case) to send stats to the proxy
Describe the results you received:
See above
Describe the results you expected:
See above