datadog-agent
datadog-agent copied to clipboard
[system-probe] report client-side TCP failed connections
What does this PR do?
This PR adds detection and reporting of client-side failed TCP connections. Those are sorted per connection tuples, with a counter of the number of failed attempts. This features needs this payload change: https://github.com/DataDog/agent-payload/pull/172 Pipelines for this PR will fail until the payload change is merged.
Motivation
Additional Notes
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
- Build system-probe & start it.
- In another shell, generate a failing connection. One way to do this is by trying to connect to a closed port:
nc localhost 10000
The connection should promptly fail.
Another case, that takes longer to test is making a connection timeout. This can be done with an iptables rule:
sudo iptables -A OUTPUT -p tcp -d 127.0.0.1 --dport 10000 -j DROP
nc localhost 10000
After around 2m nc should fail, and you can proceed to the next step.
- Poll system-probe for connections and check the failed connection appears in the response:
sudo curl -s --unix-socket /opt/datadog-agent/run/sysprobe.sock http://unix/network_tracer/connections|jq .failedConns
The answer should look like this:
[
{
"pid": 110443,
"laddr": {
"ip": "127.0.0.1",
"port": 57308,
"containerId": "",
"hostId": "0",
"hostName": ""
},
"raddr": {
"ip": "127.0.0.1",
"port": 10000,
"containerId": "",
"hostId": "0",
"hostName": ""
},
"family": "v4",
"type": "tcp",
"direction": "outgoing",
"netNS": 4026531840,
"failureCount": "1"
}
]
Reviewer's Checklist
- [ ] If known, an appropriate milestone has been selected; otherwise the
Triagemilestone is set. - [ ] Use the
major_changelabel if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote. - [ ] A release note has been added or the
changelog/no-changeloglabel has been applied. - [ ] Changed code has automated tests for its functionality.
- [ ] Adequate QA/testing plan information is provided if the
qa/skip-qalabel is not applied. - [ ] At least one
team/..label has been applied, indicating the team(s) that should QA this change. - [ ] If applicable, docs team has been notified or an issue has been opened on the documentation repo.
- [ ] If applicable, the
need-change/operatorandneed-change/helmlabels have been applied. - [ ] If applicable, the
k8s/<min-version>label, indicating the lowest Kubernetes version compatible with this feature. - [ ] If applicable, the config template has been updated.