ansible-datadog icon indicating copy to clipboard operation
ansible-datadog copied to clipboard

deployment fails on selinux enabled systems

Open ssbarnea opened this issue 4 years ago • 4 comments

RUNNING HANDLER [ansible-datadog : restart datadog-agent] ***************************************************************************************************************************
Saturday 30 November 2019  14:42:34 +0000 (0:00:00.026)       0:00:56.091 *****
fatal: [n0]: FAILED! => {
    "changed": false
}

MSG:

Unable to start service datadog-agent: Job for datadog-agent.service failed because the control process exited with error code.
See "systemctl status datadog-agent.service" and "journalctl -xe" for details.

Debug

Nov 30 14:42:42 n0 setroubleshoot[2653]: SELinux is preventing systemd from unlink access on the file process-agent.pid. For complete SELinux messages run: sealert -l f6319e9f-8584>
Nov 30 14:42:42 n0 platform-python[2653]: SELinux is preventing systemd from unlink access on the file process-agent.pid.

                                          *****  Plugin catchall (100. confidence) suggests   **************************

                                          If you believe that systemd should be allowed unlink access on the process-agent.pid file by default.
                                          Then you should report this as a bug.
                                          You can generate a local policy module to allow this access.
                                          Do
                                          allow this access for now by executing:
                                          # ausearch -c 'systemd' --raw | audit2allow -M my-systemd
                                          # semodule -X 300 -i my-systemd.pp

Incomplete solution

Running these commands does remediate the problem but still the agent is not functional:

-- Unit datadog-agent.service has finished shutting down.
Nov 30 14:45:45 n0 systemd[1]: datadog-agent.service: Start request repeated too quickly.
Nov 30 14:45:45 n0 systemd[1]: datadog-agent.service: Failed with result 'exit-code'.
Nov 30 14:45:45 n0 systemd[1]: Failed to start Datadog Agent.
-- Subject: Unit datadog-agent.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit datadog-agent.service has failed.
--
-- The result is RESULT.
Nov 30 14:45:45 n0 systemd[1]: Dependency failed for Datadog Trace Agent (APM).
-- Subject: Unit datadog-agent-trace.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit datadog-agent-trace.service has failed.
--
-- The result is RESULT.
Nov 30 14:45:45 n0 systemd[1]: datadog-agent-trace.service: Job datadog-agent-trace.service/start failed with result 'dependency'.
Nov 30 14:45:45 n0 systemd[1]: Dependency failed for Datadog Process Agent.
-- Subject: Unit datadog-agent-process.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
--
-- Unit datadog-agent-process.service has failed.
--
-- The result is RESULT.

ssbarnea avatar Nov 30 '19 14:11 ssbarnea

I think I was able to narrow down the other issue:

Nov 30 14:55:09 n0 agent[12815]: Error: Failed to setup config unable to load Datadog config file: While parsing config: yaml: line 7: could not find expected ':'
Nov 30 14:55:09 n0 agent[12815]: Error: unable to set up global agent configuration: unable to load Datadog config file: While parsing config: yaml: line 7: could not find expected>

In that particular case the format of the api_key was wrong, so it would count as user-error but this uncovered a real role bug: it returned success even if agent failed to start. The last task in the role should assure that the agent is running.

ssbarnea avatar Nov 30 '19 14:11 ssbarnea

CONTRIBUTING.md

Sandradeb5 avatar Dec 26 '19 03:12 Sandradeb5

Hello @ssbarnea,

I can't seem to reproduce the role bug you described, that the role returned success even though the agent has failed to start. This check is made explicitly if you tell the role to start the agent by having the attribute datadog_enabled set to true.

Could you please send us your datadog attributes while redacting private information (api keys, etc)? Could you also please send us your ansible runbook log?

Thanks a lot!

kbogtob avatar Jan 09 '20 14:01 kbogtob

@ssbarnea ping!

albertvaka avatar Feb 10 '20 14:02 albertvaka