datadog-agent
datadog-agent copied to clipboard
[system-probe] Add process monitoring and USM tagging
What does this PR do?
- Adds USM tags for connection processes. Only
DD_ENV
,DD_SERVICE
, andDD_VERSION
environment variables are added asenv:
,service:
, andversion:
respectively. -
runtime_security_config.event_monitoring.enabled
has been replaced with two new configs,event_monitoring_config.network_process.enabled
, andevent_monitoring_config.process.enabled
, bothfalse
by default. Both will turn on the runtime security module - A new config
event_monitoring_config.network_process.max_tracked_processes
, set to1024
by default; this is the size of the process LRU cache described below - process data from the runtime security module is stored in a new LRU cache. Only process data for processes that have the USM environment variables (see above) or have a container ID are stored
Motivation
Additional Notes
~33% increase in cpu; mostly coming from the runtime security module.
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
- enable network tracer process event monitoring by setting in the system-probe config:
event_monitoring_config:
network_process:
enabled: true
- run system-probe with above config
- in a separate console/terminal on the same machine, run
DD_ENV=env DD_SERVICE=service DD_VERSION=version FOO=bar curl https://www.google.com
- query the system-probe for connections with
curl --unix-socket /opt/datadog-agent/run/sysprobe.sock http://unix/connections
(note the unix socket path could be different in your environment; check the system-probe log file for the path). You should see thetags
field in the returned json set to
"tags": [
"env:env",
"version:version",
"service:service"
],
on the connection entry for google.com, you should see:
"tags": [
0,
1,
2
],
Reviewer's Checklist
- [x] If known, an appropriate milestone has been selected; otherwise the
Triage
milestone is set. - [ ] Use the
major_change
label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote. - [x] A release note has been added or the
changelog/no-changelog
label has been applied. - [ ] Changed code has automated tests for its functionality.
- [x] Adequate QA/testing plan information is provided if the
qa/skip-qa
label is not applied. - [x] At least one
team/..
label has been applied, indicating the team(s) that should QA this change. - [ ] If applicable, docs team has been notified or an issue has been opened on the documentation repo.
- [ ] If applicable, the
need-change/operator
andneed-change/helm
labels have been applied. - [ ] If applicable, the config template has been updated.
For visibility, noting that this change requires kernel version 4.10 because CWS uses LRU maps.